使用Fluentd + Elasticsearch收集访问日志

栏目: 后端 · 发布时间: 6年前

内容简介：本文介绍如何：先决条件

本文介绍如何：

跨服务器收集和处理Web应用程序日志。
将收集的日志近乎实时地发送到聚合器Fluentd。
将收集的日志存储到Elasticsearch中。
使用Kibana可视化数据。

先决条件

对Fluentd，Elasticsearch和Kibana的基本了解
Fluentd ， Elasticsearch 和 Kibana 已安装

我们想做什么？

想象一下，你有一个应用程序，它与外部提供商交换数据。一切都很好，但有时会出现问题，您或他们需要知道您发送的数据和他们要求的数据。然后你用谷歌搜索并意识到你需要有一个访问日志，五分钟后你将包括slf4j + logback / log4j2并写入服务器中的文件。您的应用程序开始获得点击，现在你有十个节点的集群，日志分散在十个节点上。现在，每次需要查找请求时，都需要在每个节点中执行，当你意识到你需要集中你的日志，这篇文章来帮助你。

我们怎么做？

还有一堆的工具，你可以用它来集中应用程序日志：rsyslog, logstash, flume, scribe, fluentd, 从应用程序的角度来看，我将使用logback来记录和流畅地将数据发送给流利的人。E lasticsearch将保留日志数据，以便稍后由kibana查询。

将日志发送到本地fluentd

首先，我们需要能够记录请求和响应。这可以通过不同的方式实现，我将使用logback-access库，它就像一个用于logback的插件，并且与Jetty完美契合。

使用您最喜欢的依赖管理器将其包含在您的应用中：

<dependency>
    <groupId>ch.qos.logback</groupId>
    <artifactId>logback-access</artifactId>
    <version>1.2.3</version>
</dependency>`

这个库提供了几个类，我们将使用ch.qos.logback.access.servlet.TeeFilter来访问请求和响应有效负载（正文），并使用ch.qos.logback.access.jetty.RequestLogImp ll来发布请求并响应要回溯的数据，以便在我们的日志布局中使用它们。现在我们需要将这些类插入Jetty，有两行要突出显示：

contextHandler.addFilter(new FilterHolder(new TeeFilter()), “/*”, util.EnumSet.of(DispatcherType.INCLUDE, DispatcherType.REQUEST, DispatcherType.FORWARD))

我们使用TeeFilter来拦截所有匹配正则表达式“/ *”的请求，以复制请求和响应有效负载，以供我们记录。

requestLog.setResource("/logback-access.xml")

Logback-access使用它自己的配置文件，它是可配置的（默认路径是{jetty.home} /etc/logback-access.xml）。应该是这样的：

<configuration>
    <appender name=<font>"FLUENCY"</font><font> <b>class</b>=</font><font>"ch.qos.logback.more.appenders.FluencyLogbackAppender"</font><font>>
        <!-- Tag <b>for</b> Fluentd. Farther in<b>for</b>mation: http:</font><font><i>//docs.fluentd.org/articles/config-file --></i></font><font>
        <tag>accesslog</tag>
        <!-- Host name/address and port number which Flentd placed -->
        <remoteHost>localhost</remoteHost>
        <port>20001</port>

        <!-- [Optional] Configurations to customize Fluency's behavior: https:</font><font><i>//github.com/komamitsu/fluency#usage  --></i></font><font>
        <ackResponseMode>false</ackResponseMode>
        <fileBackupDir>/tmp</fileBackupDir>
        <!-- Initial chunk buffer size is 1MB (by <b>default</b>)-->
        <bufferChunkInitialSize>2097152</bufferChunkInitialSize>
        <!--Threshold chunk buffer size to flush is 4MB (by <b>default</b>)-->
        <bufferChunkRetentionSize>16777216</bufferChunkRetentionSize>
        <!-- Max total buffer size is 512MB (by <b>default</b>)-->
        <maxBufferSize>268435456</maxBufferSize>
        <!-- Max wait until all buffers are flushed is 10 seconds (by <b>default</b>)-->
        <waitUntilBufferFlushed>30</waitUntilBufferFlushed>
        <!-- Max wait until the flusher is terminated is 10 seconds (by <b>default</b>) -->
        <waitUntilFlusherTerminated>40</waitUntilFlusherTerminated>
        <!-- Flush interval is 600ms (by <b>default</b>)-->
        <flushIntervalMillis>200</flushIntervalMillis>
        <!-- Max retry of sending events is 8 (by <b>default</b>) -->
        <senderMaxRetryCount>12</senderMaxRetryCount>
        <!-- [Optional] Enable/Disable use of EventTime to get sub second resolution of log event date-time -->
        <useEventTime><b>true</b></useEventTime>

        <encoder>
            <pattern><![CDATA[REQUEST FROM %remoteIP ON %date{yyyy-MM-dd HH:mm:ss,UTC} UTC </font><font><i>// %responseHeader{X-UOW} // responseHeader{X-RequestId}    %n</i></font><font>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%fullRequest
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
%fullResponse
                ]]>
            </pattern>
        </encoder>
    </appender>

    <appender-ref ref=</font><font>"FLUENCY"</font><font>/>
</configuration>
</font>

使用 logback-more-appenders 的 ch.qos.logback.more.appenders.FluencyLogbackAppender 插入logback-access 和Fluency。

<dependency>
    <groupId>org.komamitsu</groupId>
    <artifactId>fluency</artifactId>
    <version>1.8.1</version>
</dependency>
<dependency>
    <groupId>com.sndyuk</groupId>
    <artifactId>logback-more-appenders</artifactId>
    <version>1.5.0</version>
  </dependency>

Fluency 有很多缓冲风格配置你需要调整，这里有很好的解释。对于本文，我们将重点关注tag，remoteHost和port。

tag用于标记事件。我们将使用它来匹配我们在流利的事件中的事件，并能够解析，过滤和转发它们到elasticsearch。
remoteHost是事件将被发送的地方，在这种情况下我们将有一个本地流利的所以我们使用'localhost'
port ， fluentd监听端口

encoder.pattern定义事件的布局。它与您的日志模式相同，您可以使用占位符，但在此提交发布之前无法使用MDC数据。以下是我们的活动将如何显示的示例：

REQUEST FROM 69.28.94.231 ON 2018-10-30 00:00:00 UTC <font><i>// myapp-node-00-1540857599992 // h5hSUaVHvr >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POST /my/app/path HTTP/1.1 X-Forwarded-Proto: https X-Forwarded-For: 69.28.94.231 Host: my.company.com Content-Length: 30 Content-Type: application/json</i></font><font>
</font>

{<font>"message"</font><font>: </font><font>"This is the body of the request"</font><font> }
</font>

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HTTP/1.1 200 OK X-RequestId: h5hSUaVHvr X-UOW: myapp-node-00-1540857599992 Date: Mon, 29 Oct 2018 23:59:59 GMT Content-Type: application/json; charset=UTF-8

{<font>"message"</font><font>: </font><font>"This is the body of the response"</font><font>, </font><font>"status"</font><font>: </font><font>"Okey!"</font><font>}
</font>

从本地Fluency 转发日志到远程Fluency

我们配置本地流利，以处理我们的事件并将它们转发给Fluency的聚合器。配置文件（默认情况下为/etc/td-agent/td-agent.conf）:

<source>
    @type forward
    port 20001
</source>

<filter accesslog>
    @type parser
    key_name msg
    reserve_data false
    <parse>
        @type multiline
        format_firstline /^REQUEST FROM/
        format1 /REQUEST FROM (?<request.ip>[^ ]*) ON (?<time>\d{4}-\d{2}-\d{2} \d{2}\:\d{2}\:\d{2} [^ ]+) <font><i>// (?<request.uow>[^ ]*) // (?<request.id>[^ ]*)\n/</i></font><font>
        format2 />{49}\n/
        format3 /(?<request.method>[^ ]*) (?<request.path>[^ ]*) (?<request.protocol>[^ ]*)\n/
        format4 /(?<request.headers>(?:.|\n)*?)\n\n/
        format5 /(?<request.body>(?:.|\n)*?)\n/
        format6 /<{49}\n/
        format7 /(?<response.protocol>[^ ]*) (?<response.status.code>[^ ]*) (?<response.status.description>[^\n]*)\n/
        format8 /(?<response.headers>(?:.|\n)*?)\n\n/
        format9 /(?<response.body>(?:.|\n)*?)\n\Z/
    </parse>
</filter>

# Parse request.headers=</font><font>"Header: Value\n Header: Value\n"</font><font> to become and Object request.headers={</font><font>"Header"</font><font>: </font><font>"Value"</font><font>, </font><font>"Header"</font><font>: </font><font>"Value"</font><font>}
<filter accesslog>
  @type record_transformer
  enable_ruby <b>true</b>
  renew_record false
  auto_typecast <b>true</b>
  <record>
    hostname </font><font>"#{Socket.gethostname}"</font><font>
    request.headers ${Hash[record[</font><font>"request.headers"</font><font>].each_line.map { |l| l.chomp.split(': ', 2) }]}
    response.headers ${Hash[record[</font><font>"response.headers"</font><font>].each_line.map { |l| l.chomp.split(': ', 2) }]}
  </record>
</filter>

<match accesslog>
    @type forward
    send_timeout 5s
    recover_wait 10s
    hard_timeout 30s
    flush_interval 5s
    <server>
        name elastic-node-00
        host elastic-node-00
        port 24224
        weight 100
    </server>
</match>

<match **>
    @type file
      path /tmp/fluentd/output/messages
</match>
</font>

强调：

source.port与我们在logback-access.xml中配置的端口相同，用于发送logaccess事件。
filter 和match标签有' accesslog '关键字。这是我之前提到过的标签。我们正在使用完美的匹配，但可以有一个正则表达式。
match标签将我们的事件转发到位于主机'elastic-node-00'中的Fluency聚合器并侦听端口24224
Filter按顺序应用
filter.parse有一个正则表达式来解析我们的事件。组标签（如response.body或 request.method）将在过滤后用作json属性。例如，我们的示例事件在每个过滤器后将如下所示：

First filter
{ 
 ...
 <font>"time"</font><font>: </font><font>"2018-10-30 00:00:00 UTC"</font><font>
 </font><font>"request.ip"</font><font>: </font><font>"192.168.0.1"</font><font>,
 </font><font>"request.uow"</font><font>: </font><font>"myapp-node-00-1540857599992"</font><font>,
 </font><font>"request.id"</font><font>: </font><font>"h5hSUaVHvr"</font><font>,
 </font><font>"request.method"</font><font>: </font><font>"POST"</font><font>,
 </font><font>"request.path"</font><font>: </font><font>"/my/app/path"</font><font>,
 </font><font>"request.protocol"</font><font>: </font><font>"HTTP/1.1"</font><font>,
 </font><font>"request.headers"</font><font>: </font><font>"X-Forwarded-Proto: https\nX-Forwarded-For: 69.28.94.231\nHost: my.company.com\nContent-Length: 30\nContent-Type: application/json"</font><font>,
 </font><font>"request.body"</font><font>: </font><font>"{\"message\": \"This is the body of the request\" }"</font><font>,
 </font><font>"response.protocol"</font><font>: </font><font>"HTTP/1.1"</font><font>,
 </font><font>"response.status.code"</font><font>: </font><font>"200"</font><font>,
 </font><font>"response.status.description"</font><font>: </font><font>"OK"</font><font>
 </font><font>"response.headers"</font><font>: </font><font>"X-RequestId: h5hSUaVHvr\nX-UOW: myapp-node-00-1540857599992\nDate: Mon, 29 Oct 2018 23:59:59 GMT\nContent-Type: application/json; charset=UTF-8"</font><font>
 </font><font>"response.body"</font><font>: </font><font>"{\"message\": \"This is the body of the response\", \"status\": \"Okey!\"}"</font><font>
 ...
}
Second filter
{ 
 ...
 </font><font>"time"</font><font>: </font><font>"2018-10-30 00:00:00 UTC"</font><font>
 </font><font>"request.ip"</font><font>: </font><font>"192.168.0.1"</font><font>,
 </font><font>"request.uow"</font><font>: </font><font>"myapp-node-00-1540857599992"</font><font>,
 </font><font>"request.id"</font><font>: </font><font>"h5hSUaVHvr"</font><font>,
 </font><font>"request.method"</font><font>: </font><font>"POST"</font><font>,
 </font><font>"request.path"</font><font>: </font><font>"/my/app/path"</font><font>,
 </font><font>"request.protocol"</font><font>: </font><font>"HTTP/1.1"</font><font>,
 </font><font>"request.headers"</font><font>: { </font><font>"X-Forwarded-Proto"</font><font>: </font><font>"https"</font><font>, 
       </font><font>"X-Forwarded-For"</font><font>: </font><font>"69.28.94.231"</font><font>, 
       </font><font>"Host"</font><font>:</font><font>" my.company.com"</font><font>, 
       </font><font>"Content-Length"</font><font>: </font><font>"30"</font><font>, 
       </font><font>"Content-Type"</font><font>: </font><font>"application/json"</font><font> 
      },
 </font><font>"request.body"</font><font>: </font><font>"{\"message\": \"This is the body of the request\" }"</font><font>,
 </font><font>"response.protocol"</font><font>: </font><font>"HTTP/1.1"</font><font>,
 </font><font>"response.status.code"</font><font>: </font><font>"200"</font><font>,
 </font><font>"response.status.description"</font><font>: </font><font>"OK"</font><font>
 </font><font>"response.headers"</font><font>: {
       </font><font>"X-RequestId: "</font><font>h5hSUaVHvr</font><font>",
       </font><font>"X-UOW"</font><font>: </font><font>"myapp-node-00-1540857599992"</font><font>,
       </font><font>"Date"</font><font>: </font><font>"Mon, 29 Oct 2018 23:59:59 GMT"</font><font>,
       </font><font>"Content-Type"</font><font>: </font><font>"application/json; charset=UTF-8"</font><font>
      }
 </font><font>"response.body"</font><font>: </font><font>"{\"message\": \"This is the body of the response\", \"status\": \"Okey!\"}"</font><font>
 ...
}
</font>

将收集的日志存储到Elasticsearch中

这部分非常简单，我们必须接收事件并将它们转发给elasticsearch。Fluentd配置文件应如下所示：

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match accesslog>
  @type elasticsearch
  scheme http
  host localhost
  port 9200
  logstash_format <b>true</b>
  validate_client_version <b>true</b>
</match>

强调

source.port 与我们在 match.server.port中配置的端口相同
match.logstash_format 生成格式为 logstash -YYYY-mm-dd的Elasticsearch索引
match.port表示Elasticsearch API侦听端口

在Kibana中查看数据

现在您只需要进入Kibana应用所有访问日志

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Is Parallel Programming Hard, And, If So, What Can You Do About

Paul E. McKenney

The purpose of this book is to help you understand how to program shared-memory parallel machines without risking your sanity.1 By describing the algorithms and designs that have worked well in the pa......一起来看看《Is Parallel Programming Hard, And, If So, What Can You Do About 》这本书的介绍吧!

码农工具

使用Fluentd + Elasticsearch收集访问日志

Is Parallel Programming Hard, And, If So, What Can You Do About

JS 压缩/解压工具

图片转BASE64编码

HSV CMYK 转换工具