变更记录
- 2022-06-17 编写node_exporter
- 2022-06-15 编写mysqld_exporter
node_exporter
一、binary部署
1.1、安装node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar -zxf node_exporter-1.3.1.linux-amd64.tar.gz
mv node_exporter-1.3.1.linux-amd64 /data/monitor/node_exporter1.2、配置systemd启动node_exporter
cat node_exporter-9400.service
[Unit]
Description=node_exporter service
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
LimitNOFILE=1000000
#LimitCORE=infinity
LimitSTACK=10485760
ExecStart=/data/monitor/node_exporter/scripts/run_node_exporter.sh
Restart=always
RestartSec=15s
[Install]
WantedBy=multi-user.target启动脚本
cat run_node_exporter.sh
set -e
# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
DEPLOY_DIR=/data/monitor/node_exporter
cd "${DEPLOY_DIR}" || exit 1
exec > >(tee -i -a "/data/monitor/node_exporter/log/node_exporter.log")
exec 2>&1
exec bin/node_exporter \
--web.listen-address=":9400" \
--collector.tcpstat \
--collector.systemd \
--collector.mountstats \
--collector.meminfo_numa \
--collector.interrupts \
--collector.vmstat.fields="^.*" \
--log.level="info"二、k8s部署
2.1 编写node_exporter Daemonset yaml
apiVersionapps/v1
kindDaemonSet
metadata
namenode-exporter-9400
namespacekube-system
spec
selector
matchLabels
appnode-exporter-9400
template
metadata
labels
appnode-exporter-9400
spec
containers
args
--web.listen-address=0.0.0.0:9400
--path.procfs=/host/proc
--path.sysfs=/host/sys
--path.rootfs=/host/root
imageharbor.foxchan.com/prom/node-exporterv1.3.1
imagePullPolicyIfNotPresent
namenode-exporter-9400
resources
terminationMessagePath/dev/termination-log
terminationMessagePolicyFile
volumeMounts
mountPath/host/proc
nameproc
mountPath/host/sys
namesys
mountPath/host/root
mountPropagationHostToContainer
nameroot
readOnlytrue
dnsPolicyClusterFirst
hostNetworktrue
hostPIDtrue
restartPolicyAlways
schedulerNamedefault-scheduler
securityContext
terminationGracePeriodSeconds30
volumes
hostPath
path/proc
type""
nameproc
hostPath
path/sys
type""
namesys
hostPath
path/
type""
nameroot
tolerations
# 为了 node exporter 能够被调度到 master 节点中运行,我们需要为 Pod 添加容忍度属性
key"node-role.kubernetes.io/master"
operator"Exists"
effect"NoSchedule"
updateStrategy
typeOnDelete2.2 编写 svc
由于 node exporter Pod 分散在各个节点,为了便于 Prometheus 收集这些 node exporter 的 Pod IP,需要创建 Endpoint 统一收集,这里通过创建 Service 自动生成 Endpoint 来达到目的。
apiVersionv1
kindService
metadata
annotations
prometheus.io/scrape"true"
prometheus.io/port"9400"
prometheus.io/path/metrics
namenode-exporter-9400
namespacekube-system
labels
appnode-exporter-9400
spec
ports
namehttp
port9400
protocolTCP
targetPort9400
selector
appnode-exporter-94002.3 加入prometheus 自动发现
在 Kubernetes 下,Promethues 通过与 Kubernetes API 集成,目前主要支持5中服务发现模式,分别是:Node、Service、Pod、Endpoints、Ingress。
prometheus 添加如下
global
scrape_interval15s
evaluation_interval15s
scrape_timeout10s
rule_files
/prometheus-rules/*.rules.yml
scrape_configs
job_name'kubernetes-node-exporter'
kubernetes_sd_configs
roleendpoints
api_serverhttps//k8s.foxchan.com8443
tls_config
ca_file/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verifytrue
bearer_token_file/var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs
source_labels__meta_kubernetes_endpoints_name
regex'node-exporter-9400'
actionkeepmysqld_exporter
1、安装mysql_exporter
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.linux-amd64.tar.gz
tar -zxf mysqld_exporter-0.14.0.linux-amd64.tar.gz
mv mysqld_exporter-0.14.0.linux-amd64 /data/monitor/mysqld_exporter2、在MySQL中创建用户并授权
密码如果有特殊字符建议放到中间,避免exporter无法识别
CREATE USER 'mysqld_exporter'@'localhost' IDENTIFIED BY 'exporter!12ASzx' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';3、写一个mysqld_exporter的配置文件
cat /data/monitor/mysqld_exporter/.my.cnf
[client]
host=localhost
user=exporter
password=exporter!12ASzx
socket=/var/lib/mysql/mysql.sock4、配置systemd启动mysqld_exporter
cat mysqld_exporter-9401.service
[Unit]
Description=mysqld_exporter
After=network.target
[Service]
ExecStart=/data/monitor/mysqld_exporter/scripts/run_mysqld_exporter.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target启动脚本
cat run_mysqld_exporter.sh
set -e
# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
DEPLOY_DIR=/data/monitor/mysqld_exporter
cd "${DEPLOY_DIR}" || exit 1
exec > >(tee -i -a "/data/monitor/mysqld_exporter/log/mysqld_exporter.log")
exec 2>&1
exec bin/mysqld_exporter \
--web.listen-address=":9401" \
--config.my-cnf /data/monitor/mysqld_exporter/.my.cnf \
--log.level="info" \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.innodb_tablespaces \
--collect.info_schema.innodb_cmp \
--collect.info_schema.innodb_cmpmem5、确认监控指标正常
curl http://localhost:9401/metrics|grep mysql_up
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 295k 0 295k 0 0 9.9M 0 --:--:-- --:--:-- --:--:-- 10.3M
# HELP mysql_up Whether the MySQL server is up.
# TYPE mysql_up gauge
mysql_up 16、在Prometheus的server端添加job任务
vim prometheus.yml
scrape_configs
file_sd_configs
files
mysql.yml
job_nameMySQL
metrics_path/metrics
relabel_configs
source_labels__address__
regex(.*)
target_label__address__
replacement$1mysql.yml
labels
instancemysql13306 # grafana显示的实例的别名
targets
172.18.0.23:9401 # mysqld_exporter暴露的端口
labels
instancemysql23306 # grafana显示的实例的别名
targets
172.18.0.23:9402 # mysqld_exporter暴露的端口7、在grafana中导入MySQL监控图表
点击import,在弹出界面中输入7362,数据源选择Prometheus
8、配置mysql_exporter告警规则
cat mysql_rules.yml
groups
namemysql.rules
rules
alertMysqlDown
exprup == 0
for0m
labels
severitycritical
annotations
title'MySQL down'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL instance is down"
alertMysqlRestarted
exprmysql_global_status_uptime < 60
for0m
labels
severityinfo
annotations
title'MySQL Restarted'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL has just been restarted, less than one minute ago"
alertMysqlTooManyConnections(>80%)
expravg by (instance) (rate(mysql_global_status_threads_connected1m)) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 80
for2m
labels
severitywarning
annotations
title'MySQL too many connections (> 80%)'
description"Mysql实例: 【{{ $labels.instance }}】, More than 80% of MySQL connections are in use, Current Value: {{ $value }}%"
alertMysqlThreadsRunningHigh
exprmysql_global_status_threads_running > 40
for2m
labels
severitywarning
annotations
title'MySQL Threads_Running High'
description"Mysql实例: 【{{ $labels.instance }}】, Threads_Running above the threshold(40), Current Value: {{ $value }}"
alertMysqlQpsHigh
exprsum by (instance) (rate(mysql_global_status_queries2m)) > 500
for2m
labels
severitywarning
annotations
title'MySQL QPS High'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL QPS above 500"
alertMysqlSlowQueries
exprincrease(mysql_global_status_slow_queries1m) > 0
for2m
labels
severitywarning
annotations
title'MySQL slow queries'
description"Mysql实例: 【{{ $labels.instance }}】, has some new slow query."
alertMysqlTooManyAbortedConnections
exprround(increase(mysql_global_status_aborted_connects5m)) > 20
for2m
labels
severitywarning
annotations
title'MySQL too many Aborted connections in 2 minutes'
description"Mysql实例: 【{{ $labels.instance }}】, {{ $value }} Aborted connections within 2 minutes"
alertMysqlTooManyAbortedClients
exprround(increase(mysql_global_status_aborted_clients120m)) > 10
for2m
labels
severitywarning
annotations
title'MySQL too many Aborted connections in 2 hours'
description"Mysql实例: 【{{ $labels.instance }}】, {{ $value }} Aborted Clients within 2 hours"
alertMysqlSlaveIoThreadNotRunning
exprmysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_io_running == 0
for0m
labels
severitycritical
annotations
title'MySQL Slave IO thread not running'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL Slave IO thread not running"
alertMysqlSlaveSqlThreadNotRunning
exprmysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_sql_running == 0
for0m
labels
severitycritical
annotations
title'MySQL Slave SQL thread not running'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL Slave SQL thread not running"
alertMysqlSlaveReplicationLag
exprmysql_slave_status_master_server_id > 0 and ON (instance) (mysql_slave_status_seconds_behind_master - mysql_slave_status_sql_delay) > 30
for1m
labels
severitycritical
annotations
title'MySQL Slave replication lag'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL replication lag"
alertMysqlInnodbLogWaits
exprrate(mysql_global_status_innodb_log_waits15m) > 10
for0m
labels
severitywarning
annotations
title'MySQL InnoDB log waits'
description"Mysql实例: 【{{ $labels.instance }}】, innodb log writes stalling"9、将告警规则集成到Prometheus
prometheus.yml
rule_files
"/rules/*.yml"10、mysqld_exporter监控多个数据库
可以根据配置文件分开监控,日志也分开
拷贝2个mysqld_exporter安装包
cp -r mysqld_exporter-0.14.1.linux-amd64 mysqld_exporter_3306
cp -r mysqld_exporter-0.14.1.linux-amd64 mysqld_exporter_3307
修改监控文件为对应的库的配置
cd/ data/mysqld_exporter_3307
cat .my.cnf
[client]
host = 127.0.0.1
user = exporter
password = expoter12ssdc3
socket = /tmp/mysql_3307.sock
修改启动文件也根据情况修改11、常见报错
- level=error msg=“Error pinging mysqld: Error 1045: Access denied for user ‘root’@‘localhost’ (using password: YES)” source=“exporter.go:146”
1、如上报错,是授权问题,数据库中记得授权,并且查看mysqld_exporter下的my.cnf中的配置
2、如果授权密码都没问题,就要注意密码最后一位不能是字符,特别是#,mysqld_exporter 不识别,导致密码不对无法登陆
- level=error msg=“Error pinging mysqld: this user requires old password authentication. If you still want to use it, please add ‘allowOldPasswords=1’ to your DSN. See also https://github.com/go-sql-driver/mysql/wiki/old_passwords” source=“exporter.go:146
见于老数据库,是由于old_paswords参数导致,启动时候设置DSN的环境变量来启动,不用配置文件 格式
DATA_SOURCE_NAME=用户名:密码@unixl(Sock)/?参数=值&参数=值or
DATA_SOURCE_NAME=用户名:密码@(ip地址:端口)/?参数=值&参数=值
DATA SOURCE NANE=exporter:expoter12Ssdc3unix\(/tmp/mysql 3306.sock\)/?allowOldPasswords=true









