zabbix 日志监控（自定义插件开发）

栏目: 服务器 · 发布时间: 5年前

内容简介：先说一下背景，有服务跑的地方就有监控，有监控的地方往往少不了zabbix的优美身影，搞监控我大胆猜一下，朋友们很多都有监控日志的需求，比如老板让你监控一下服务的 "ERROR" 日志，监控一下java 的 NPE，Full GC 等等zabbix 内部自带了日志的监控模块，但是我在使用的时候发现确实很不好用（可能是我不会用

拖了很久终于下定决心来写这篇博客，希望真的能对有需要的朋友有所帮助

先说一下背景，有服务跑的地方就有监控，有监控的地方往往少不了zabbix的优美身影，搞监控我大胆猜一下，朋友们很多都有监控日志的需求，比如老板让你监控一下服务的 "ERROR" 日志，监控一下 java 的 NPE，Full GC 等等

zabbix 内部自带了日志的监控模块，但是我在使用的时候发现确实很不好用（可能是我不会用

先看一下效果, 下面是监控java 服务 Full GC 的场景，配置好之后，就可以查看数据，在latest data 中可以看到获取到的日志信息

选择你关联配置了日志监控的主机，查看数据

zabbix 日志监控（自定义插件开发）

其中None 为没匹配到关键词的输出，转换为None 来输出是为了简化数据的输出

zabbix 日志监控（自定义插件开发）

输出还有一种情况下是 Noline , 不要慌，这种情况是因为两次采集的间隔时间段没有新的记录，使用时间戳来解决会重复读取记录而误报的问题 zabbix 日志监控（自定义插件开发）

如何配置

1、配置模板

和配置其他item 没有太大的区别

Key ：这里需要配置的4个参数依次为：

logpath : 日志的路径

50：表示一次读取的行数，可以根据自己日志刷新频率适当设置

'stringkey' : 要检索的字符串

60 : 执行脚本的间隔 (和 Update interval 的必须值一致)

Type of information ：配置为 log 类型

zabbix 日志监控（自定义插件开发）

trigger 的配置，可以根据自己的需求来定义告警阈值

Name：根据场景设置容易识别的醒目的内容

Severity: 告警的级别

Expression: 如图配置，我这里使用了count，表示连续8次检测，检测到关键词的次数大于3次就告警

zabbix 日志监控（自定义插件开发）

2、模本配置完了，咱们搞一下插件

在你的agent 的 bin 目录先增加 monitorlogs.py 脚本（注意增加执行权限）

#!/usr/bin/env  python
# *_* coding:utf8 *_*

__author__ = 'xbzy007'

import subprocess, os, sys
import time, datetime
import re
import traceback

class FindOut(object):
    def __init__(self, logfile, linenums, stringkeys, interval):
        self.logfile = logfile
        self.linenums = linenums
        self.stringkeys = stringkeys
        self.interval = interval
        self.logtype_a = '^([0-9]+\-){2}[0-9]+T[0-9]*'
        self.logtype_b = '^([0-9]+\-){2}[0-9]+ [0-9]*'

    def getlogtimetype(self, loglinelist):
        logtype_tag = ''
        for i in range(0,len(loglinelist),1):
            line = loglinelist[i]
            if re.match(self.logtype_a, line, flags=0):
                logtype_tag = self.logtype_a
            elif re.match(self.logtype_b, line, flags=0):
                logtype_tag = self.logtype_b

        return logtype_tag

    def filterthelogfile(self):
        #if not os.path.exists(self.logfile):
        #    print "{0} not exist".format(self.logfile)
        #    sys.exit(0)
        match_logname = self.logfile
        logdir = os.path.abspath(os.path.dirname(self.logfile) + os.path.sep + ".")
        logfilenamelist = []
        lastlogfile = ''
        lastmtime = 338054400.0
        for tfile in os.listdir(logdir):  
            filepath = os.path.join(logdir, tfile)  
            matchx = re.match(match_logname, filepath, flags=0)
            if matchx :
                if os.path.isfile(filepath):  
                    logfilenamelist.append(filepath)
        if len(logfilenamelist):
            for onelogfile in logfilenamelist:
                filemtime = os.path.getmtime(onelogfile)
                if filemtime > lastmtime:
                    lastmtime = filemtime
                    lastlogfile = onelogfile

        if lastlogfile:
            return lastlogfile
        else:
            print "not found newest logfile"
            return None

    def findoutstring(self):
        logfile = self.filterthelogfile()
        if logfile :
            p = subprocess.Popen("tail -%s %s" % (self.linenums, logfile), \
                                env=None, shell=True, stdout=subprocess.PIPE)
            out, err = p.communicate()
            p.wait()
            if (p.returncode == 0):
                res = out.strip('\n')
            else:
                print "open  %s Failed" % logfile
                sys.exit(1)
        else :
            print "%s not found" % logfile
            sys.exit(2)

        ####  标记值，标记在给定的时间段是否有记录
        tags = 0
        curtimestamp = time.time()
        getlineslist = res.split('\n')
        logtype = self.getlogtimetype(getlineslist)
        if logtype == self.logtype_b :
            ######  查找符合时间区域的第一条记录
            for i in range(0,len(getlineslist),1):
                line = getlineslist[i]
                var = line.strip()
                try:
                    strtime = var.split()[0] + ' ' +var.split()[1]
                    strtime = strtime.split(',')[0]
                    timeArray = time.strptime(strtime, "%Y-%m-%d %H:%M:%S")
                    timestamp = time.mktime(timeArray)
                    contrasttime = int(curtimestamp) - self.interval
                    if int(timestamp) > contrasttime :
                        tags += 1
                        break
                except :
                    #traceback.print_exc()
                    continue
        elif logtype == self.logtype_a :

            ######  查找符合时间区域的第一条记录
            for i in range(0,len(getlineslist),1):
                line = getlineslist[i]
                var = line.strip()
                try:
                    strtime = var.split()[0]
                    strtime = strtime.split('.')[0]
                    strtime = strtime.replace('T', ' ')
                    timeArray = time.strptime(strtime, "%Y-%m-%d %H:%M:%S")
                    timestamp = time.mktime(timeArray)
                    contrasttime = int(curtimestamp) - self.interval
                    if int(timestamp) > contrasttime :
                        tags += 1
                        break
                except :
                    #traceback.print_exc()
                    continue
        else:
            print  'logtype can not be analysis'
            sys.exit(4)
        ###########  在符合的记录中查找 关键字，起始位置为上一步查到的标记位i 
        if tags:
            for j in range(i, len(getlineslist), 1):
                line = getlineslist[j].strip()
                res = re.search(self.stringkeys, getlineslist[j], flags=0)
                if res:
                    line = line + ' logsmonitor'
                    return line
            return None  ###没有找到有关键字的记录
        else :
            return 'Noline' ### 在规定的时间段没有找到符合的记录(老的数据)

if __name__ == '__main__' :
    if len(sys.argv) < 4 :
        print "need 4 args: [ logfile ] [ readline nums ] [ stringkeys ] [ interval time ]" 
        sys.exit(-1)
    logfile= sys.argv[1]
    getlinenums = int(sys.argv[2])
    if getlinenums > 1000 :
        getlinenums = 1000
    stringkeys = sys.argv[3]
    interval = int(sys.argv[4])
    #x = FindOut(logfile, , 'to active state', interval)
    x = FindOut(logfile, getlinenums, stringkeys, interval)
    res = x.findoutstring()
    print res

在你的agent 的 etc/zabbix_agentd.conf.d/ 下面增加插件对应的配置文件：UserParametermonitorlogs.conf

UserParameter=mlogs[*],/usr/local/zmonitor/bin/monitorlogs.py "$1" "$2" "$3" "$4"

然后重启 agent

最后给你需要监控日志的主机关联上模板就可以完成日志的监控, 是不是想实战验证一把，希望能解决你同样的困扰，让世上没有难搞的监控

以上所述就是小编给大家介绍的《zabbix 日志监控（自定义插件开发）》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Producing Masculinity

Michele White / Routledge / 2019-3 / $39.95

Thoughtful, witty, and illuminating, in this book Michele White explores the ways normative masculinity is associated with computers and the Internet and is a commonly enacted online gender practice. ......一起来看看《Producing Masculinity》这本书的介绍吧!

码农工具