Prometheus
Prometheus 是一个监控系统,它与Zabbix很像也可以通过agent的方式获取指标数据
Prometheus 架构如何组成?
Prometheus 如何部署?
为什么采用Deployment而不是Statefulset方式部署?
1. 拉取相关资源清单文件
从git拉取最主要的资源清单文件
# git clone https://github.com/iKubernetes/k8s-prom.git
当然仓库的清单文件就一定齐全正确的么? 毫无疑问,k8s学习的路上一脚一个坑,唉,习惯就好了。
# cd k8s-prom/k8s-prometheus-adapter
# rm -f custom-metrics-apiserver-deployment.yaml
# wget https://raw.githubusercontent.com/DirectXMan12/k8s-prometheus-adapter/master/deploy/manifests/custom-metrics-apiserver-deployment.yaml
# wget https://raw.githubusercontent.com/DirectXMan12/k8s-prometheus-adapter/master/deploy/manifests/custom-metrics-config-map.yaml
k8s-prom中k8s-prometheus-adapter配置参数其中一个新版本删除了,所以会导致启动失败,而且默认没有引用配置文件也会导致启动失败
k8s-prom中缺少k8s-prometheus-adapter的configMap资源,所以需要在下载configMap资源清单
2. 创建名称空间
# kubectl create ns prom
namespace/prom created
3. 创建node-exporter
node_exporter目录下有两个文件,node-exporter-ds.yaml,node-exporter-svc.yaml,分别是DaemonSet资源定义和Service资源定义
# kubectl apply -f node_exporter -n prom
daemonset.apps/prometheus-node-exporter created
service/prometheus-node-exporter created
4. 创建prometheus相关资源
# kubectl apply -f prometheus -n prom
configmap/prometheus-config created
deployment.apps/prometheus-server created
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
service/prometheus created
如果测试环境内存小的话,如下错误需要调整降低下prometheus/prometheus-deploy.yaml
文件中的spec.template.spec.containers.resources.limits.memory字段,默认是2Gi,这里调整为了200Mi
5. 查看prom空间资源创建情况
# kubectl get all -n prom -o wide
NAME READY STATUS RESTARTS AGE IP NODE
pod/prometheus-node-exporter-9t6lv 1/1 Running 0 3m 172.31.117.179 node002
pod/prometheus-node-exporter-pdh84 1/1 Running 0 3m 172.31.117.178 node003
pod/prometheus-node-exporter-wd7x9 1/1 Running 0 3m 172.31.117.180 node001
pod/prometheus-server-65f5d59585-ztw7g 0/1 Pending 0 1m <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/prometheus NodePort 10.96.177.220 <none> 9090:30090/TCP 1m app=prometheus,component=server
service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 3m app=prometheus,component=node-exporter
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/prometheus-node-exporter 3 3 3 3 3 <none> 3m prometheus-node-exporter prom/node-exporter:v0.15.2 app=prometheus,component=node-exporter
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/prometheus-server 1 1 1 0 1m prometheus prom/prometheus:v2.2.1 app=prometheus,component=server
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/prometheus-server-65f5d59585 1 1 0 1m prometheus prom/prometheus:v2.2.1 app=prometheus,component=server,pod-template-hash=2191815141
如果测试环境节点内存小的话,会出现我上面的问题,prometheus-server一直处于pending状态,查看详细情况如下警告事件
# kubectl describe pod pod/prometheus-server-65f5d59585-ztw7g -n prom
...
Warning FailedScheduling 33s (x25 over 1m) default-scheduler 0/3 nodes are available: 3 Insufficient memory.
该错误需要调整降低下prometheus/prometheus-deploy.yaml
文件中的spec.template.spec.containers.resources.limits.memory字段,默认最大内存是2Gi,这里调整为了200Mi,然后重新应用清单文件即可。 PS: 删除掉limit字段也行
6. 创建kube-state-metrics转换指标数据
修改集群角色授权,为kube-state-metrics添加list列出configmaps、secrets权限
# cat kube-state-metrics-rbac.yaml
...
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources: ["nodes","configmaps","secrets", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
verbs: ["list", "watch", "get"]
...
# kubectl apply -f kube-state-metrics/
7. 创建k8s-prometheus-adapter相关资源
由于custom-metrics-apiserver要使用HTTPS协议通信,而且要和Kubernetes APIServer进行通信,所以需要提前创建一套由KubernetesCA认证的证书私钥
创建私钥和证书请求,并用Kubernetes的ca证书和私钥批准签发证书
# cd /etc/kubernetes/pki/
# (uname 077; openssl genrsa -out serving.key 2048)
# openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
# openssl x509 -req -in serving.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out serving.crt -days 3650
创建Secret资源导入证书和私钥
# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key -n prom
secret/cm-adapter-serving-certs created
修改集群角色授权,为custom-metrics-resource-reader添加node list权限
# cat ./custom-metrics-resource-reader-cluster-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: custom-metrics-resource-reader
rules:
- apiGroups:
- ""
resources:
- namespaces
- pods
- services
- nodes
verbs:
- get
- list
- watch
创建k8s-prometheus-adapter相关资源
# kubectl apply -f k8s-prometheus-adapter/
检查相关资源创建情况
# kubectl get pod -n prom
# kubectl describe pod custom-metrics-apiserver-65f545496-z7vm5 -n prom
检查api群组是否有custom metrics相关
# kubectl api-versions|grep metric
custom.metrics.k8s.io/v1beta1
metrics.k8s.io/v1beta1
查看相关api中提供的自定义指标
# curl http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
...
]
}
补充说明
中文文档:https://love2.io/@1046102779/doc/prometheus/introductions/overview.md