夜莺nightingalev6.7.2中监控kubelet与容器

一、背景

本人是基于nightingaleV6.7.2与k8s-amd64-v1.23.10进行了此文章中的操作。

使用categraf-v0.3.45-linux-amd64做数据采集工具,容器监控数据采集使用的cadvisor插件。关于categraf中使用caadvisor插件的文档:https://flashcat.cloud/docs/content/flashcat-monitor/categraf/plugin/cadvisor/。 但是我当时看了此文档后,还是有一个配置项bearer_token_string或bearer_token_file不知如何配置(相关的配置文档其实官网已经给出,只是我先前也没有找到),此方法不太方便暂不阐述,直接使用如下更简便的方法做配置达到监控kubelet与容器的目的。

二、配置步骤

参考flashcat官方文档:https://flashcat.cloud/blog/kubernetes-monitoring-05-kubelet/,转换成自己的描述、内容更加简洁,也是一个对flashcat及夜莺的一个宣传吧。

2.1 创建认证信息

访问”https://localhost:10250/metrics“时需要鉴权,按照如下方式创建k8s中的相关账号与角色。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#将下述内容保存为auth.yaml并应用到k8s集群中
---
apiVersion: v1
kind: Namespace
metadata:
name: flashcat
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: categraf-daemonset
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
- nodes/stats
- nodes/proxy
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: categraf-daemonset
namespace: flashcat
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: categraf-daemonset
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: categraf-daemonset
subjects:
- kind: ServiceAccount
name: categraf-daemonset
namespace: flashcat
1
2
3
4
#应用与查看
kubectl apply -f auth.yaml
kubectl get ClusterRole | grep categraf-daemon
kubectl get sa -n flashcat
image-20240526231854765

2.2 测试上述创建的权限

已经成功创建了 ServiceAccount,把ServiceAccount的内容打印出来看一下:

1
kubectl get sa categraf-daemonset -n flashcat -o yaml

这个ServiceAccount实际是关联了一个Secret,再看看这个Secret的内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
kubectl get sa categraf-daemonset -n flashcat -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"categraf-daemonset","namespace":"flashcat"}}
creationTimestamp: "2024-05-26T15:17:43Z"
name: categraf-daemonset
namespace: flashcat
resourceVersion: "1265"
uid: 9fdf7004-22f6-410f-9696-40485e38db50
secrets:
- name: categraf-daemonset-token-h4p94

把这个token字段拿到,然后base64转码一下,作为Bearer Token来请求测试一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
token=`kubectl -n flashcat get secret categraf-daemonset-token-h4p94 -o jsonpath={.data.token} | base64 -d`

curl -s -k -H "Authorization: Bearer $token" https://localhost:10250/metrics > aaaa
head -n 10 aaaa
#输出如下内容表示已经正常访问https://localhost:10250/metrics
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0

至此,我们已经成功获取到”https://localhost:10250/metrics“的token,可以将其配置到categraf的conf/input.cadvisor/cadvisor.toml 文件中的bearer_token_string配置项。但如果要正常获取k8s集群中所有容器数据,需要为k8s集群每个节点都安装categraf并如此配置。如果后续k8s集群新增节点了,新增加的节点也需要重复此配置,考虑到此,引出下文中的创建daemonset。

如果使用了如下daemonset,理应将categraf中的cadvisor插件关闭,否则二者重复了。

2.3 创建ConfigMap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#将下述内容保存为 categraf-configmap-v2.yaml 文件。
---
kind: ConfigMap
metadata:
name: categraf-config
apiVersion: v1
data:
config.toml: |
[global]
hostname = "$HOSTNAME"
interval = 15
providers = ["local"]
[writer_opt]
batch = 2000
chan_size = 10000
[[writers]]
url = "http://172.20.180.50:17000/prometheus/v1/write"
timeout = 5000
dial_timeout = 2500
max_idle_conns_per_host = 100
---
kind: ConfigMap
metadata:
name: categraf-input-prometheus
apiVersion: v1
data:
prometheus.toml: |
[[instances]]
urls = ["http://127.0.0.1:10249/metrics"]
labels = { job="kube-proxy" }
[[instances]]
urls = ["https://127.0.0.1:10250/metrics"]
bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
use_tls = true
insecure_skip_verify = true
labels = { job="kubelet" }
[[instances]]
urls = ["https://127.0.0.1:10250/metrics/cadvisor"]
bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
use_tls = true
insecure_skip_verify = true
labels = { job="cadvisor" }

如上内容中,”172.20.180.50:17000“是夜莺服务所在的ip与端口,根据实际情况修改。

hostname = "$HOSTNAME" 这个配置用了 $ 符号,后面创建 Daemonset 的时候会把 HOSTNAME 这个环境变量注入,让 Categraf 自动拿到。

1
2
3
4
#应用此文件
kubectl apply -f categraf-configmap-v2.yaml -n flashcat
configmap/categraf-config created
configmap/categraf-input-prometheus created

2.4 创建daemonset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#将下述内容保存为 categraf-daemonset-v2.yaml 文件。
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: categraf-daemonset
name: categraf-daemonset
spec:
selector:
matchLabels:
app: categraf-daemonset
template:
metadata:
labels:
app: categraf-daemonset
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: HOSTIP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
image: flashcatcloud/categraf:v0.2.18
imagePullPolicy: IfNotPresent
name: categraf
volumeMounts:
- mountPath: /etc/categraf/conf
name: categraf-config
- mountPath: /etc/categraf/conf/input.prometheus
name: categraf-input-prometheus
hostNetwork: true
serviceAccountName: categraf-daemonset
restartPolicy: Always
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- configMap:
name: categraf-config
name: categraf-config
- configMap:
name: categraf-input-prometheus
name: categraf-input-prometheus

根据官方文档,创建 Daemonset,会把 HOSTNAME 环境变量注入进去。官方文档默认使用的是spec.nodeName,我改成使用status.hostIP,这样后续采集到的指标数据中,ident属性的值就不再是容器宿主机名,而是容器宿主机ip了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#应用此文件
kubectl apply -f categraf-daemonset-v2.yaml -n flashcat
daemonset.apps/categraf-daemonset created

#稍等一会儿,等pod正常运行
kubectl get pods -n flashcat -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
categraf-daemonset-jpgxs 1/1 Running 0 24s 172.20.180.25 worker02 <none> <none>
categraf-daemonset-rdlbk 1/1 Running 0 25s 172.20.180.21 master01 <none> <none>
categraf-daemonset-xnqdj 1/1 Running 0 24s 172.20.180.24 worker01 <none> <none>

#daemonset运行正常
kubectl get daemonset -n flashcat
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
categraf-daemonset 3 3 3 3 3 <none> 31s

2.5 确认采集到的数据

查看kubelet本身的指标,它是从 Kubelet 的 /metrics 接口采集的(可以看到采集到的指标数据中有一个属性ident属性,它的值是被监控服务器的ip,如下是172.20.180.21):

image-20240526232912388

再确认cAdvisor 采集的指标及其数据,它是从 /metrics/cadvisor 接口采集的:

image-20240526233510353

2.6 监控仪表盘

使用内置仪表盘 “Kubernetes / Container”,克隆一个自己使用的仪表盘。得到的大体效果如下:

image-20241019222209824

夜莺nightingalev6.7.2中监控kubelet与容器
https://jiangsanyin.github.io/2024/05/26/夜莺nightingalev6-7-2通过kubelet采集docker容器监控数据/
作者
sanyinjiang
发布于
2024年5月26日
许可协议