Prometheus

Prometheus 是一个监控系统,它与Zabbix很像也可以通过agent的方式获取指标数据

Prometheus 架构如何组成?

Prometheus 如何部署?

为什么采用Deployment而不是Statefulset方式部署?

1. 拉取相关资源清单文件

从git拉取最主要的资源清单文件

# git clone https://github.com/iKubernetes/k8s-prom.git

当然仓库的清单文件就一定齐全正确的么? 毫无疑问,k8s学习的路上一脚一个坑,唉,习惯就好了。

# cd k8s-prom/k8s-prometheus-adapter
# rm -f custom-metrics-apiserver-deployment.yaml
# wget https://raw.githubusercontent.com/DirectXMan12/k8s-prometheus-adapter/master/deploy/manifests/custom-metrics-apiserver-deployment.yaml
# wget https://raw.githubusercontent.com/DirectXMan12/k8s-prometheus-adapter/master/deploy/manifests/custom-metrics-config-map.yaml

k8s-prom中k8s-prometheus-adapter配置参数其中一个新版本删除了,所以会导致启动失败,而且默认没有引用配置文件也会导致启动失败

k8s-prom中缺少k8s-prometheus-adapter的configMap资源,所以需要在下载configMap资源清单

2. 创建名称空间

# kubectl create ns prom
namespace/prom created

3. 创建node-exporter

node_exporter目录下有两个文件,node-exporter-ds.yaml,node-exporter-svc.yaml,分别是DaemonSet资源定义和Service资源定义

# kubectl apply -f node_exporter -n prom
daemonset.apps/prometheus-node-exporter created
service/prometheus-node-exporter created

4. 创建prometheus相关资源

# kubectl apply -f prometheus -n prom
configmap/prometheus-config created
deployment.apps/prometheus-server created
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
service/prometheus created

如果测试环境内存小的话,如下错误需要调整降低下prometheus/prometheus-deploy.yaml文件中的spec.template.spec.containers.resources.limits.memory字段,默认是2Gi,这里调整为了200Mi

5. 查看prom空间资源创建情况

# kubectl get all -n prom -o wide
NAME                                     READY     STATUS    RESTARTS   AGE       IP               NODE
pod/prometheus-node-exporter-9t6lv       1/1       Running   0          3m        172.31.117.179   node002
pod/prometheus-node-exporter-pdh84       1/1       Running   0          3m        172.31.117.178   node003
pod/prometheus-node-exporter-wd7x9       1/1       Running   0          3m        172.31.117.180   node001
pod/prometheus-server-65f5d59585-ztw7g   0/1       Pending   0          1m        <none>           <none>

NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE       SELECTOR
service/prometheus                 NodePort    10.96.177.220   <none>        9090:30090/TCP   1m        app=prometheus,component=server
service/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         3m        app=prometheus,component=node-exporter

NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE       CONTAINERS                 IMAGES                       SELECTOR
daemonset.apps/prometheus-node-exporter   3         3         3         3            3           <none>          3m        prometheus-node-exporter   prom/node-exporter:v0.15.2   app=prometheus,component=node-exporter

NAME                                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE       CONTAINERS   IMAGES                   SELECTOR
deployment.apps/prometheus-server   1         1         1            0           1m        prometheus   prom/prometheus:v2.2.1   app=prometheus,component=server

NAME                                           DESIRED   CURRENT   READY     AGE       CONTAINERS   IMAGES                   SELECTOR
replicaset.apps/prometheus-server-65f5d59585   1         1         0         1m        prometheus   prom/prometheus:v2.2.1   app=prometheus,component=server,pod-template-hash=2191815141

如果测试环境节点内存小的话,会出现我上面的问题,prometheus-server一直处于pending状态,查看详细情况如下警告事件

# kubectl describe pod pod/prometheus-server-65f5d59585-ztw7g -n prom
...
Warning  FailedScheduling  33s (x25 over 1m)  default-scheduler  0/3 nodes are available: 3 Insufficient memory.

该错误需要调整降低下prometheus/prometheus-deploy.yaml文件中的spec.template.spec.containers.resources.limits.memory字段,默认最大内存是2Gi,这里调整为了200Mi,然后重新应用清单文件即可。 PS: 删除掉limit字段也行

6. 创建kube-state-metrics转换指标数据

修改集群角色授权,为kube-state-metrics添加list列出configmaps、secrets权限

# cat kube-state-metrics-rbac.yaml
...
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources: ["nodes","configmaps","secrets", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
  verbs: ["list", "watch", "get"]
...
# kubectl apply -f kube-state-metrics/

7. 创建k8s-prometheus-adapter相关资源

由于custom-metrics-apiserver要使用HTTPS协议通信,而且要和Kubernetes APIServer进行通信,所以需要提前创建一套由KubernetesCA认证的证书私钥

创建私钥和证书请求,并用Kubernetes的ca证书和私钥批准签发证书

# cd /etc/kubernetes/pki/
# (uname 077; openssl genrsa -out serving.key 2048)
# openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
# openssl x509 -req -in serving.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out serving.crt -days 3650

创建Secret资源导入证书和私钥

# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key -n prom
secret/cm-adapter-serving-certs created

修改集群角色授权,为custom-metrics-resource-reader添加node list权限

# cat ./custom-metrics-resource-reader-cluster-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-reader
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - pods
  - services
  - nodes
  verbs:
  - get
  - list
  - watch

创建k8s-prometheus-adapter相关资源

# kubectl  apply -f k8s-prometheus-adapter/

检查相关资源创建情况

# kubectl get pod -n prom
# kubectl describe  pod custom-metrics-apiserver-65f545496-z7vm5 -n prom

检查api群组是否有custom metrics相关

# kubectl api-versions|grep metric
custom.metrics.k8s.io/v1beta1
metrics.k8s.io/v1beta1

查看相关api中提供的自定义指标

# curl http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    ...
  ]
}

补充说明

中文文档:https://love2.io/@1046102779/doc/prometheus/introductions/overview.md

results matching ""

    No results matching ""