承接上文
Kubernetes部署应用服务
,本文介绍基于ELK的k8s应用日志收集
一、日志架构
日志采集方式一般分为主动采集和被动采集两种,在K8s中,被动采集一般分为Sidecar和DaemonSet两种方式,主动采集分为DockerEngine推送和业务直写两种方式。
- DockerEngine推送,使用DockerEngine本身具有LogDriver功能,可以通过配置不同的LogDriver将容器的stdout通过DockerEngine写入到远端存储,以此达到日志采集的目的。
- 业务直写,是在应用中集成日志采集的SDK,通过SDK直接将日志发送到服务端。
-
DaemonSet,在每个node节点上只运行一个日志代理,采集这个节点上所有的日志。
-
Sidecar,为每个POD集成一个日志代理容器,这个agent容器只负责当前POD内的业务应用的日志采集。
各方式对比如下:
DockerEngine推送 | 业务直写 | DaemonSet | Sidecar | |
---|---|---|---|---|
部署难度 | 低,原生支持 | 低,只需维护好配置文件即可 | 一般,需维护DaemonSet | 较高,每个POD都需要部署 |
隔离性 | 弱 | 弱,日志直写会和业务逻辑竞争资源 | 一般,只能通过配置间隔离 | 强,通过容器进行隔离,可单独分配资源 |
资源占用 | 低 | 低 | 一般 | 高 |
可定制性 | 低 | 高 | 低 | 高 |
耦合度 | 高,与DockerEngine强绑定,修改需要重启DockerEngine | 高,采集模块修改/升级需要重新发布业务 | 低,agent可独立升级 | 一般,默认agent升级对应pod需要重启(有一些扩展包可以支持Sidecar热升级) |
出现瓶颈率 | 高 | 低 | 高 | 低 |
适用场景 | 非生产场景 | 对性能要求极高的场景 | 中小型集群 | 大型、混合型、PAAS型集群 |
本文采取使用sidecar容器运行日志代理的方法。首先创建一个带有日志记录代理的sidecar容器,将该代理容器与业务容器放进一个POD里运行。然后通过hostpath
挂载方式将日志目录持久化到节点上(为防止ELK不可用时,能继续查看日志),代理容器挂载节点日志目录,将日志采集输出到外部Kafka集群。外部Logstash程序去Kafka集群消费对应topic消息并进行过滤筛选,再将日志输出到外部Elasticsearch集群。最后通过Kibana对数据进行可视化分析。
优点:日志收集比较灵活,可扩展性强,性能较高
缺点:资源占用高,部署繁琐
注意事项:业务容器建议关闭日志控制台标准输出,避免两倍的磁盘占用量,因docker会将标准输出以json格式写入文件。关闭日志控制台标准输出会导致
kubectl logs
无法查看日志
二、部署流程
1、安装外部Elasticsearch和Kafka
-
安装Elasticsearch
#!/bin/bash set -u data_path="/data/elasticsearch/data" logs_path="/data/elasticsearch/logs" repo_path="/data/backup/elasticsearch" install_path="/usr/local/elasticsearch-7.15.2" temp_path="/data/elasticsearch/tmp" pid_path="/var/run/elasticsearch" USER="elastic" log() { local LOG_LEVEL=$1 local LOG_MSG=$2 basename_cmd=`which basename` SHELL_NAME=$($basename_cmd $0) if [ "$LOG_LEVEL" == "ERROR" ] || [ "$LOG_LEVEL" == "INFO" ];then echo "[$(/bin/date "+%Y-%m-%d %H:%M:%S")] $$ $SHELL_NAME: [$LOG_LEVEL] $LOG_MSG" fi } download_package() { log "INFO" "正在下载软件包..." cd /usr/local/src && wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.15.2-linux-x86_64.tar.gz >/dev/null 2>&1 if [ $? -ne 0 ]; then log "ERROR" "下载软件包失败!" exit 2 fi log "INFO" "软件包安装中..." tar zxf elasticsearch-7.15.2-linux-x86_64.tar.gz -C /usr/local if [ $? -ne 0 ]; then log "ERROR" "解压失败!" exit 2 fi } create_path_and_create_user() { id $USER >/dev/null 2>&1 if [ $? -ne 0 ]; then groupadd $USER useradd -g $USER -s /sbin/nologin -M $USER fi for i in $data_path $logs_path $repo_path $temp_path $pid_path do mkdir -p $i chown $USER. $i done chown -R $USER. $install_path } create_conf() { cp -rp $1/elasticsearch.yml{,.bak} cat > $1/elasticsearch.yml /etc/systemd/system/elasticsearch.service /dev/null 2>&1 log "INFO" "安装成功,启动服务中..." systemctl start elasticsearch sleep 20 PID=$(ps aux|grep [e]lasticsearch.pid|awk '{print $2}') if [ -z $PID ]; then log "ERROR" "服务启动失败!" exit 4 else log "INFO" "服务启动成功" fi } main
-
安装Kafka
#!/bin/bash source /etc/profile set -u install_path="/usr/local/kafka" data_path="${install_path}/data" JAVACMD="${JAVA_HOME}/bin/java" USER="kafka" log() { local LOG_LEVEL=$1 local LOG_MSG=$2 basename_cmd=`which basename` SHELL_NAME=$($basename_cmd $0) if [ "$LOG_LEVEL" == "ERROR" ] || [ "$LOG_LEVEL" == "INFO" ];then echo "[$(/bin/date "+%Y-%m-%d %H:%M:%S")] $$ $SHELL_NAME: [$LOG_LEVEL] $LOG_MSG" fi } download_package() { log "INFO" "正在下载软件包..." cd /usr/local/src && wget https://dlcdn.apache.org/kafka/3.0.1/kafka_2.12-3.0.1.tgz --no-check-certificate >/dev/null 2>&1 if [ $? -ne 0 ]; then log "ERROR" "下载软件包失败!" exit 2 fi log "INFO" "软件包安装中..." tar zxf kafka_2.12-3.0.1.tgz -C /usr/local if [ $? -ne 0 ]; then log "ERROR" "解压失败!" exit 2 fi cd .. && mv kafka_2.12-3.0.1 kafka } create_path_and_create_user() { id $USER >/dev/null 2>&1 if [ $? -ne 0 ]; then groupadd $USER useradd -g $USER -s /sbin/nologin -M $USER fi for i in $data_path $install_path do mkdir -p $i chown -R $USER. $i done } create_conf() { cp -rp $1/server.properties{,.bak} cat > $1/server.properties /etc/systemd/system/zookeeper.service /etc/systemd/system/kafka.service /dev/null 2>&1 systemctl enable kafka >/dev/null 2>&1 log "INFO" "安装成功,启动服务中..." systemctl start zookeeper sleep 10 systemctl start kafka sleep 20 PID=$(ps aux|grep [k]afka.Kafka|awk '{print $2}') if [ -z $PID ]; then log "ERROR" "服务启动失败!" exit 4 else log "INFO" "服务启动成功" fi } main
2、安装外部Logstash和Kibana
-
安装Logstash
#!/bin/bash set -u kafka_endpoint="10.81.56.217:12315" kafka_topic="fbu-fps-test-log" es_endpoint="10.81.56.218:9200" install_path="/usr/local/logstash-7.15.2" USER="elastic" log() { local LOG_LEVEL=$1 local LOG_MSG=$2 basename_cmd=`which basename` SHELL_NAME=$($basename_cmd $0) if [ "$LOG_LEVEL" == "ERROR" ] || [ "$LOG_LEVEL" == "INFO" ];then echo "[$(/bin/date "+%Y-%m-%d %H:%M:%S")] $$ $SHELL_NAME: [$LOG_LEVEL] $LOG_MSG" fi } download_package() { log "INFO" "正在下载软件包..." cd /usr/local/src && wget https://artifacts.elastic.co/downloads/logstash/logstash-7.15.2-linux-x86_64.tar.gz >/dev/null 2>&1 if [ $? -ne 0 ]; then log "ERROR" "下载软件包失败!" exit 2 fi log "INFO" "软件包安装中..." tar zxf logstash-7.15.2-linux-x86_64.tar.gz -C /usr/local if [ $? -ne 0 ]; then log "ERROR" "解压失败!" exit 2 fi } create_path_and_create_user() { id $USER >/dev/null 2>&1 if [ $? -ne 0 ]; then groupadd $USER useradd -g $USER -s /sbin/nologin -M $USER fi mkdir -p ${install_path}/temp chown -R $USER. ${install_path} } create_conf() { mkdir $1/conf.d cp -rp $1/pipelines.yml{,.bak} cat > $1/pipelines.yml $1/conf.d/${kafka_topic}.conf "$kafka_endpoint" topics => ["${kafka_topic}"] codec => "json" group_id => "logstash-${kafka_topic}" client_id => "logstash-${kafka_topic}" decorate_events => true consumer_threads => 2 } } filter { ruby { code => "event.set('logsize', event.get('message').bytesize)" } grok { match => [ "message", "^(?
[d]{4}-[d]{2}-[d]{2}s+[d]{2}:[d]{2}:[d]{2}.[d]{3})s+(? [w]+)s+[(?[wd-]*)]s+---s+[s*[sS]*?]s+(? [wd-.]*)s+:s+(? [sS]*)"] overwrite => ["message"] } date { match => [ "datetime", "YYYY-MM-dd HH:mm:ss.SSS" ] timezone => "Asia/Shanghai" target => "@timestamp" } mutate{ remove_field => ["host","agent","ecs"] } } output { elasticsearch { hosts => ["$es_endpoint"] index => "${kafka_topic}-%{+YYYY.MM.dd}" document_id => "%{[@metadata][kafka][topic]}-%{[@metadata][kafka][partition]}-%{[@metadata][kafka][offset]}" manage_template => false } } EOF sed -i -e '19a node.name: logstash01' -e '87a config.reload.automatic: true' $1/logstash.yml sed -i -e 's/-Xms1g/-Xms2g/g' -e 's/-Xmx1g/-Xmx2g/g' -e "37a -Djava.io.tmpdir=${install_path}/temp" $1/jvm.options return 0 } create_service() { cat > /etc/default/logstash /etc/systemd/system/logstash.service /dev/null 2>&1 log "INFO" "安装成功,启动服务中..." systemctl start logstash sleep 20 PID=$(ps aux|grep [l]ogstash.Logstash|awk '{print $2}') if [ -z $PID ]; then log "ERROR" "服务启动失败!" exit 4 else log "INFO" "服务启动成功" fi } main -
安装Kibana
#!/bin/bash set -u logs_path="/usr/local/kibana-7.15.2/logs" install_path="/usr/local/kibana-7.15.2" es_endpoint="10.81.56.218:9200" pid_path="/var/run/kibana" USER="elastic" log() { local LOG_LEVEL=$1 local LOG_MSG=$2 basename_cmd=`which basename` SHELL_NAME=$($basename_cmd $0) if [ "$LOG_LEVEL" == "ERROR" ] || [ "$LOG_LEVEL" == "INFO" ];then echo "[$(/bin/date "+%Y-%m-%d %H:%M:%S")] $$ $SHELL_NAME: [$LOG_LEVEL] $LOG_MSG" fi } download_package() { log "INFO" "正在下载软件包..." cd /usr/local/src && wget https://artifacts.elastic.co/downloads/kibana/kibana-7.15.2-linux-x86_64.tar.gz >/dev/null 2>&1 if [ $? -ne 0 ]; then log "ERROR" "下载软件包失败!" exit 2 fi log "INFO" "软件包安装中..." tar zxf kibana-7.15.2-linux-x86_64.tar.gz -C /usr/local cd .. && mv kibana-7.15.2-linux-x86_64 kibana-7.15.2 if [ $? -ne 0 ]; then log "ERROR" "解压失败!" exit 2 fi } create_path_and_create_user() { id $USER >/dev/null 2>&1 if [ $? -ne 0 ]; then groupadd $USER useradd -g $USER -s /sbin/nologin -M $USER fi for i in $logs_path $pid_path do mkdir -p $i chown -R $USER. $i done chown -R $USER. $install_path } create_conf() { cp -rp $1/kibana.yml{,.bak} cat > $1/kibana.yml /etc/systemd/system/kibana.service /dev/null 2>&1 log "INFO" "安装成功,启动服务中..." systemctl start kibana sleep 20 PID=$(ps aux|grep [k]ibana|awk '{print $2}') if [ -z $PID ]; then log "ERROR" "服务启动失败!" exit 4 else log "INFO" "服务启动成功" fi } main
-
导入索引模板
PUT _template/fbu-fps-test-log { "order" : 0, "index_patterns" : [ "fbg-bms-applogs*" ], "settings" : { "index" : { "codec" : "best_compression", "refresh_interval" : "5s", "number_of_shards" : "3", "number_of_replicas" : "1" } }, "mappings" : { "dynamic_templates" : [ { "strings_as_keyword" : { "mapping" : { "ignore_above" : 1024, "type" : "keyword" }, "match_mapping_type" : "string" } } ], "properties" : { "msg" : { "type" : "keyword" }, "datetime" : { "ignore_above" : 256, "type" : "keyword" }, "@timestamp" : { "type" : "date" }, "appname" : { "ignore_above" : 256, "type" : "keyword" }, "kafka" : { "properties" : { "topic" : { "ignore_above" : 256, "type" : "keyword" } } }, "loglevel" : { "ignore_above" : 256, "type" : "keyword" }, "@version" : { "ignore_above" : 256, "type" : "keyword" }, "message" : { "type" : "text" }, "sys" : { "properties" : { "hostname" : { "ignore_above" : 256, "type" : "keyword" }, "ip" : { "ignore_above" : 256, "type" : "keyword" } } }, "class" : { "ignore_above" : 256, "type" : "keyword" }, "tags" : { "ignore_above" : 256, "type" : "keyword" } } }, "aliases" : { } }
-
创建topic
/usr/local/kafka/bin/kafka-topics.sh --bootstrap-server localhost:12315 --create --replication-factor 1 --partitions 2 --topic fbu-fps-test-log /usr/local/kafka/bin/kafka-topics.sh --bootstrap-server localhost:12315 --list
3、业务pod集成Filebeat容器
3.1、打包镜像方式一
-
编写Dockerfile文件
FROM alpine:3.14 ARG Version ARG System ARG Author LABEL Version=${Version:-v1.0.0} System=${System:-public} Author=${Author} RUN apk add --no-cache curl bash libc6-compat musl-dev && rm -rf /var/cache/apk/* RUN curl -Lso - https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.1-linux-x86_64.tar.gz | tar zxf - -C /tmp && mv /tmp/filebeat-7.3.1-linux-x86_64 /usr/share/filebeat COPY docker-entrypoint.sh /usr/bin/ WORKDIR /usr/share/filebeat RUN mkdir data logs && chown -R root. . && chmod 0750 data logs && chmod 0750 /usr/bin/docker-entrypoint.sh ENTRYPOINT ["/usr/bin/docker-entrypoint.sh"]
docker-entrypoint.sh文件
#!/bin/sh set -eux Topics=${Topics:-"business-live-logs"} Host_ip=${Host_ip} Hostnames=${Hostnames:-localhost} Workdir=/usr/share/filebeat import_configfile () { cat > ${Workdir}/filebeat.yml
-
构建镜像试运行
docker build --build-arg Version=v1.0.1 --build-arg System=ELK --build-arg Author=zt17879 -t filebeat:v1.0.1 ./ docker run -e Topics=business-test-logs -e Host_ip=10.81.0.101 -e Hostnames=cnsz-fbu-bck8s-master01-uat -d --name=filebeat filebeat:v1.0.1
-
推送镜像到harbor仓库
docker tag filebeat:v1.0.1 harbor.eminxing.com/middleware/filebeat:v1.0.1 docker push harbor.eminxing.com/middleware/filebeat:v1.0.1
-
修改启用deployment资源文件
fbu-fps-task-worker-test_deployment.yaml文件
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" field.cattle.io/description: Export Billing Services labels: business: fbu environment: test service: fps-task-worker system: fps tier: backend name: fbu-fps-task-worker-test namespace: apps spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: service: fps-task-worker matchExpressions: - key: environment operator: In values: [test] - key: business operator: In values: [fbu] strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: business: fbu environment: test service: fps-task-worker system: fps tier: backend spec: containers: - name: fbu-fps-task-worker env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: ENVS value: test - name: Xmx value: 2048m image: harbor.eminxing.com/fbu-fps-task-worker/fbu-fps-task-worker:v1.0.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 1 successThreshold: 1 tcpSocket: port: 9102 timeoutSeconds: 1 readinessProbe: failureThreshold: 3 httpGet: path: /health port: 9102 scheme: HTTP initialDelaySeconds: 2 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 2 startupProbe: failureThreshold: 3 httpGet: path: /health port: 9102 scheme: HTTP initialDelaySeconds: 45 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 2 ports: - containerPort: 9102 name: web protocol: TCP resources: limits: cpu: "2" memory: 3Gi requests: cpu: 500m memory: 1Gi volumeMounts: - mountPath: /opt/logs/fps-task-worker name: logs subPathExpr: $(POD_NAME) - name: filebeat env: - name: Topics value: fbu-fps-test-log - name: Host_ip valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: Hostnames valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: harbor.eminxing.com/middleware/filebeat:v1.0.1 imagePullPolicy: IfNotPresent resources: limits: cpu: "1" memory: 512Mi requests: cpu: 256m memory: 256Mi volumeMounts: - mountPath: /opt/logs name: logs dnsConfig: options: - name: ndots value: "2" dnsPolicy: ClusterFirst imagePullSecrets: - name: harbor-login restartPolicy: Always terminationGracePeriodSeconds: 30 volumes: - hostPath: path: /opt/logs/fps-task-worker type: DirectoryOrCreate name: logs
-
创建index pattern
-
查看日志是否收集正常
2.2、打包镜像方式二
-
编写Dockerfile文件
FROM alpine:3.14 ARG Version ARG System ARG Author LABEL Version=${Version:-v1.0.0} System=${System:-public} Author=${Author} RUN apk add --no-cache curl bash libc6-compat musl-dev && rm -rf /var/cache/apk/* RUN curl -Lso - https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.1-linux-x86_64.tar.gz | tar zxf - -C /tmp && mv /tmp/filebeat-7.3.1-linux-x86_64 /usr/share/filebeat WORKDIR /usr/share/filebeat RUN mkdir data logs && chown -R root. . && chmod 0750 data logs ENTRYPOINT ["/usr/share/filebeat/filebeat","-c","/usr/share/filebeat/filebeat.yml"]
-
构建镜像试运行
docker build --build-arg Version=v1.0.2 --build-arg System=ELK --build-arg Author=zt17879 -t filebeat:v1.0.2 ./ docker run -d --name=filebeat filebeat:v1.0.2
-
推送镜像到harbor仓库
docker tag filebeat:v1.0.2 harbor.eminxing.com/middleware/filebeat:v1.0.2 docker push harbor.eminxing.com/middleware/filebeat:v1.0.2
-
创建configmap资源
apiVersion: v1 kind: ConfigMap metadata: name: filebeat-config namespace: apps labels: logcollect: filebeat data: filebeat.yml: | filebeat.inputs: - type: log enabled: true paths: - /opt/logs/*/*.log ignore_older: 24h scan_frequency: 20s max_bytes: 10485760 multiline.pattern: '^[d]{4}-[d]{2}-[d]{2}s+[d]{2}:[d]{2}:[d]{2}.[d]{3}' multiline.negate: true multiline.match: after multiline.max_lines: 500 tail_files: false processors: - drop_fields: fields: ["host.name", "ecs.version", "agent.version", "agent.type", "agent.id", "agent.ephemeral_id", "agent.hostname", "input.type"] - add_fields: target: kafka fields: topic: ${TOPICS} - add_fields: target: sys fields: ip: ${PODIP} hostname: ${HOSTNAMES} output.kafka: enable: true hosts: ["10.81.56.217:12315"] topic: "%{[kafka.topic]}" worker: 2 partition.round_robin: reachable_only: true keep_alive: 120 required_acks: 1 compression: gzip max_message_bytes: 10000000
-
修改启用deployment资源文件
fbu-fps-task-worker-test_deployment.yaml文件
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" field.cattle.io/description: Export Billing Services labels: business: fbu environment: test service: fps-task-worker system: fps tier: backend name: fbu-fps-task-worker-test namespace: apps spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: service: fps-task-worker matchExpressions: - key: environment operator: In values: [test] - key: business operator: In values: [fbu] strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: business: fbu environment: test service: fps-task-worker system: fps tier: backend spec: containers: - name: fbu-fps-task-worker env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: ENVS value: test - name: Xmx value: 2048m image: harbor.eminxing.com/fbu-fps-task-worker/fbu-fps-task-worker:v1.0.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 1 successThreshold: 1 tcpSocket: port: 9102 timeoutSeconds: 1 readinessProbe: failureThreshold: 3 httpGet: path: /health port: 9102 scheme: HTTP initialDelaySeconds: 2 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 2 startupProbe: failureThreshold: 3 httpGet: path: /health port: 9102 scheme: HTTP initialDelaySeconds: 45 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 2 ports: - containerPort: 9102 name: web protocol: TCP resources: limits: cpu: "2" memory: 3Gi requests: cpu: 500m memory: 1Gi volumeMounts: - mountPath: /opt/logs/fps-task-worker name: logs subPathExpr: $(POD_NAME) - name: filebeat env: - name: TOPICS value: fbu-fps-test-log - name: PODIP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: HOSTNAMES valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: harbor.eminxing.com/middleware/filebeat:v1.0.2 imagePullPolicy: IfNotPresent resources: limits: cpu: "1" memory: 512Mi requests: cpu: 256m memory: 256Mi volumeMounts: - mountPath: /opt/logs name: logs - mountPath: /usr/share/filebeat/filebeat.yml subPath: filebeat.yml readOnly: true name: config dnsConfig: options: - name: ndots value: "2" dnsPolicy: ClusterFirst imagePullSecrets: - name: harbor-login restartPolicy: Always terminationGracePeriodSeconds: 30 volumes: - name: logs hostPath: path: /opt/logs/fps-task-worker type: DirectoryOrCreate - name: config configMap: name: filebeat-config defaultMode: 0600 items: - key: "filebeat.yml" path: "filebeat.yml"
-
创建index pattern
-
查看日志是否收集成功