目前线上服务使用了k8s进行部署,一个服务配置了多个副本,然后日志是挂载到宿主机器的目录的,所以当服务部署到三台机器时,这时要查看业务日志,就必须依次登录三台服务器来看日志。显然,这非常地不方便。团队想把日志收集到一个地方统一查看。于是开始尝试各种方案。
尝试
1. elasticsearch + fluentd + in_tail(input) + fluent-plugin-elasticsearch(output) + kibana
刚开始就测试使用网络上推荐的日志收集方案,elasticsearch + fluentd + kibana,部署完成后,经过使用,并不能很方便地对日志进行检索,因为日志格式非常多,不方便对日志进行格式化,所以收集过来的日志并不是结构化的。另一个原因是elasticsearch占用的CPU很高,这个不知道什么原因,可能给的资源不够或配置不当。不过更主要是团队成员希望最好是直接把日志收集到一台服务器,然后能够使用linux的工具,如grep,awk,less来查询日志,所以最终放弃此方案。
2. rsyslog + fluentd + in_tail(input) + fluent-plugin-remote_syslog(output)
然后开始尝试使用fluentd来收集日志并发送到rsyslog, rsyslog使用fluentd发送过来的tag来命令文件名,但由于syslog协议的限制,tag最大为32个字符,最终无奈放弃此方案。
3. fluentd-agent(input: in_tail, output: forward) fluentd-server(input: forward, ouput: fluent-plugin-forest)
最后采用agent和server端都使用fluentd,agent端的input使用in_tail,ouput使用forward,server端的input使用forward,ouput使用fluent-plugin-forest,找到fluent-plugin-forest这个插件不容易,因为它支持以tag命名文件名,并非常稳定,其它的插件由于不怎么更新了,bug挺多无法使用。
部署
server端
docker run -d -p 24224:24224 -p 24224:24224/udp -v /var/log/worker:/var/log/worker -v /etc/localtime:/etc/localtime --name fluent-server registry.cn-hangzhou.aliyuncs.com/shengjing/fluent-server
agent端
在每个agent新建一个/home/fluent目录,并设置权限为777
mkdir -p /home/fluent
chmod 777 /home/fluent
这里我们使用k8s的damonset来部署
kubectl create -f fluentd-daemonset.yaml
fluentd-daemonset.yaml:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
labels:
k8s-app: fluentd-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluentd-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: fluentd
image: registry.cn-hangzhou.aliyuncs.com/shengjing/fluent-client
imagePullPolicy: Always
env:
- name: RSYSLOG_HOST
value: "10.29.112.24"
resources:
limits:
memory: 500Mi
requests:
cpu: 100m
memory: 500Mi
volumeMounts:
- name: datlog
mountPath: /dat/log
readOnly: true
- name: fluent
mountPath: /home/fluent
- name: localtime
mountPath: /etc/localtime
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: datlog
hostPath:
path: /dat/log
- name: fluent
hostPath:
path: /home/fluent
- name: localtime
hostPath:
path: /etc/localtime
- 其中192.168.93.201为fluentd server的ip
Dockerfile
server端
Dockerfile:
FROM fluent/fluentd:v0.12-debian
COPY entrypoint.sh /bin/entrypoint.sh
RUN fluent-gem install fluent-plugin-forest \
&& chmod +x /bin/entrypoint.sh
COPY fluent.conf /fluentd/etc/
entrypoint.sh:
#!/usr/bin/dumb-init /bin/sh
uid=${FLUENT_UID:-1000}
# check if a old fluent user exists and delete it
cat /etc/passwd | grep fluent
if [ $? -eq 0 ]; then
deluser fluent
fi
# (re)add the fluent user with $FLUENT_UID
useradd -u ${uid} -o -c "" -m fluent
export HOME=/home/fluent
# chown home and data folder
chown -R fluent /home/fluent
chown -R fluent /fluentd
gosu fluent "$@"
fluent.conf:
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<match log.**>
@type forest
subtype file
<template>
time_slice_format %Y%m%d
path /var/log/worker/${tag_parts[3..-2]}
format single_value
flush_interval 2s
buffer_path /tmp/buffer/${tag_parts[3..-2]}
append true
num_threads 1
</template>
</match>
agent端
Dockerfile:
FROM fluent/fluentd:v0.12-debian
COPY entrypoint.sh /bin/entrypoint.sh
RUN chmod +x /bin/entrypoint.sh
COPY fluent.conf /fluentd/etc/
entrypoint.sh:
#!/usr/bin/dumb-init /bin/sh
uid=${FLUENT_UID:-1000}
# check if a old fluent user exists and delete it
cat /etc/passwd | grep fluent
if [ $? -eq 0 ]; then
deluser fluent
fi
# (re)add the fluent user with $FLUENT_UID
useradd -u ${uid} -o -c "" -m fluent
export HOME=/home/fluent
# chown home and data folder
chown -R fluent /home/fluent
chown -R fluent /fluentd
# replace FLUENTD_SERVER_HOST
sed -i "s/FLUENTD_SERVER_HOST/$FLUENTD_SERVER_HOST/" /fluentd/etc/fluent.conf
gosu fluent "$@"
fluent.conf:
<source>
@type tail
path /dat/log/**/*.log
tag log.*
format none
refresh_interval 5
read_from_head true
limit_recently_modified 86400
pos_file /home/fluent/dat-log.pos
</source>
<match log.**>
@type forward
<server>
name myserver1
host FLUENTD_SERVER_HOST
port 24224
weight 60
</server>
buffer_type file
buffer_path /tmp/buffer_file
flush_interval 2s
buffer_chunk_limit 8m
buffer_queue_limit 1000
num_threads 4
</match>
参考
https://github.com/tagomoris/fluent-plugin-forest
http://docs.fluentd.org/v0.12/articles/in_tail
http://docs.fluentd.org/v0.12/articles/out_forward
http://docs.fluentd.org/v0.12/articles/in_forward
http://docs.fluentd.org/v0.12/articles/out_file