引言
Docker通过cgroup技术实现对容器资源的限制,在我们自己的应用中通过对某些不重要的应用做cpu、内存资源上限的限制来确保任何时候重要的应用有足够的资源使用。下面是对docker容器做cpu与内存资源限制的实践记录,环境为centos7.2+docker1.12.6。
Linux Cgroup介绍
Linux Cgroup最主要的作用是为一个进程组设置资源使用的上限,这些资源包括CPU、内存、磁盘、网络等。在linux中,Cgroup给用户提供的操作接口是文件系统,其以文件和目录的方式组织在/sys/fs/cgroup路径下。更多的cgroup介绍可以阅读https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt、https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt、
https://coolshell.cn/articles/17049.html等文章。
Linux Cgroup简单实践
通过一个简单的例子,对一个进程的cpu使用率做限制,操作如下:
- 进入/sys/fs/cgroup/cpu路径下,创建名为hello的目录
cd /sys/fs/cgroup/cpu
mkdir hello
ls hello/
cgroup.clone_children cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.rt_period_us cpu.shares notify_on_release
cgroup.event_control cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.rt_runtime_us cpu.stat tasks
- 查看hello目录下cpu.cfs_period_us与cpu.cfs_quota_us两个文件的值
cat /sys/fs/cgroup/cpu/hello/cpu.cfs_period_us
100000 默认值100ms
cat /sys/fs/cgroup/cpu/hello/cpu.cfs_quota_us
-1 默认值表示无任何限制
cpu.cfs_period_us:设定重新分配cgroup可用CPU资源的时间间隔,单位为微秒
cpu.cfs_quota_us:设定在某一阶段(由cpu.cfs_period_us规定)某个cgroup中所有任务可运行的时间总量
- shell运行一个while死循环,然后通过top查看
while true;do :;done &
[1] 30328 进程号为30328
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30328 root 20 0 115516 628 132 R 99.9 0.1 1:10.92 bash
可见cpu使用率接近100%
- 通过hello的设置对此进程的cpu使用率做限制,使得cpu使用率上限为50%
echo 50000 > /sys/fs/cgroup/cpu/hello/cpu.cfs_quota_us
echo 30328 >> /sys/fs/cgroup/cpu/hello/tasks
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30328 root 20 0 115516 628 132 R 49.8 0.1 7:44.47 bash
通过top命令观察,该进程的cpu使用率在50%上下浮动
Docker容器的cgroup设置
这里同样以cpu设置为例,探索docker容器是如何通过cgroup做限制的
- 创建容器之前,查看/sys/fs/cgroup/cpu路径
ls -l /sys/fs/cgroup/cpu/
total 0
-rw-r--r-- 1 root root 0 Oct 11 10:30 cgroup.clone_children
--w--w--w- 1 root root 0 Oct 11 10:30 cgroup.event_control
-rw-r--r-- 1 root root 0 Oct 11 10:30 cgroup.procs
-r--r--r-- 1 root root 0 Oct 11 10:30 cgroup.sane_behavior
-r--r--r-- 1 root root 0 Oct 11 10:30 cpuacct.stat
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpuacct.usage
-r--r--r-- 1 root root 0 Oct 11 10:30 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.shares
-r--r--r-- 1 root root 0 Oct 11 10:30 cpu.stat
-rw-r--r-- 1 root root 0 Oct 11 10:30 notify_on_release
-rw-r--r-- 1 root root 0 Oct 11 10:30 release_agent
drwxr-xr-x 52 root root 0 Jan 3 14:16 system.slice
-rw-r--r-- 1 root root 0 Oct 11 10:30 tasks
drwxr-xr-x 2 root root 0 Oct 11 11:11 user.slice
可见并没有docker的相关子目录
- 创建容器
docker run -tid --name stress --entrypoint bash polinux/stress:1.0.4
- 查看/sys/fs/cgroup/cpu路径下的变化
ll /sys/fs/cgroup/cpu/
total 0
-rw-r--r-- 1 root root 0 Oct 11 10:30 cgroup.clone_children
--w--w--w- 1 root root 0 Oct 11 10:30 cgroup.event_control
-rw-r--r-- 1 root root 0 Oct 11 10:30 cgroup.procs
-r--r--r-- 1 root root 0 Oct 11 10:30 cgroup.sane_behavior
-r--r--r-- 1 root root 0 Oct 11 10:30 cpuacct.stat
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpuacct.usage
-r--r--r-- 1 root root 0 Oct 11 10:30 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Oct 11 10:30 cpu.shares
-r--r--r-- 1 root root 0 Oct 11 10:30 cpu.stat
drwxr-xr-x 3 root root 0 Jan 3 16:42 docker 多了一个docker目录
-rw-r--r-- 1 root root 0 Oct 11 10:30 notify_on_release
-rw-r--r-- 1 root root 0 Oct 11 10:30 release_agent
drwxr-xr-x 52 root root 0 Jan 3 14:16 system.slice
-rw-r--r-- 1 root root 0 Oct 11 10:30 tasks
drwxr-xr-x 2 root root 0 Oct 11 11:11 user.slice
ll /sys/fs/cgroup/cpu/docker/
total 0
-rw-r--r-- 1 root root 0 Jan 3 15:23 cgroup.clone_children
--w--w--w- 1 root root 0 Jan 3 15:23 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan 3 15:23 cgroup.procs
-r--r--r-- 1 root root 0 Jan 3 15:23 cpuacct.stat
-rw-r--r-- 1 root root 0 Jan 3 15:23 cpuacct.usage
-r--r--r-- 1 root root 0 Jan 3 15:23 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Jan 3 15:23 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Jan 3 15:23 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Jan 3 15:23 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Jan 3 15:23 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Jan 3 15:23 cpu.shares
-r--r--r-- 1 root root 0 Jan 3 15:23 cpu.stat
drwxr-xr-x 2 root root 0 Jan 3 16:42 e0ca8bf3d85d8617d950315f9b64223a294d56becd608a1c826f44d98b3dae19 容器id目录
-rw-r--r-- 1 root root 0 Jan 3 15:23 notify_on_release
-rw-r--r-- 1 root root 0 Jan 3 15:23 tasks
ll /sys/fs/cgroup/cpu/docker/e0ca8bf3d85d8617d950315f9b64223a294d56becd608a1c826f44d98b3dae19/
total 0
-rw-r--r-- 1 root root 0 Jan 3 16:42 cgroup.clone_children
--w--w--w- 1 root root 0 Jan 3 16:42 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan 3 16:42 cgroup.procs
-r--r--r-- 1 root root 0 Jan 3 16:42 cpuacct.stat
-rw-r--r-- 1 root root 0 Jan 3 16:42 cpuacct.usage
-r--r--r-- 1 root root 0 Jan 3 16:42 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Jan 3 16:42 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Jan 3 16:42 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Jan 3 16:42 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Jan 3 16:42 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Jan 3 16:42 cpu.shares
-r--r--r-- 1 root root 0 Jan 3 16:42 cpu.stat
-rw-r--r-- 1 root root 0 Jan 3 16:42 notify_on_release
-rw-r--r-- 1 root root 0 Jan 3 16:42 tasks
- 通过docker exec在容器中运行stress服务,并查看tasks文件
docker exec -it stress stress --cpu 1 --vm-bytes 200M
stress: info: [5] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
docker top stress
UID PID PPID C STIME TTY TIME CMD
root 43699 43688 0 16:42 pts/2 00:00:00 bash
root 44192 44179 0 16:46 pts/4 00:00:00 stress --cpu 1 --vm-bytes 200M
root 44199 44192 99 16:46 pts/4 00:00:53 stress --cpu 1 --vm-bytes 200M
cat /sys/fs/cgroup/cpu/docker/e0ca8bf3d85d8617d950315f9b64223a294d56becd608a1c826f44d98b3dae19/tasks
43699
44192
44199
容器内的进程号均被写入tasks文件中
由此可知在创建一个容器时:
- docker首先会在/sys/fs/cgroup/cpu路径下创建名为docker的目录
- 紧接着会在docker的目录下创建容器id名称的子目录
- 对容器的cpu的使用限制是通过操作容器id子目录下的文件设置达成的
- 容器内的进程均受容器的资源设置限制
- 其他的资源比如内存、网络等设置与cpu结构相同
Docker容器cpu限制实践
这里仅对最为常用的两个参数--cpu-period、--cpu-quota做设置,使用stress服务做验证
- 不对容器cpu做限制
docker run -itd --name stress polinux/stress:1.0.4 stress --cpu 1 --vm-bytes 200M
1633f77703ac680c6c9ff77ce5072b6c4d239a546151f945c87f57bb7011e17f
cat /sys/fs/cgroup/cpu/docker/1633f77703ac680c6c9ff77ce5072b6c4d239a546151f945c87f57bb7011e17f/cpu.cfs_period_us
100000
cat /sys/fs/cgroup/cpu/docker/1633f77703ac680c6c9ff77ce5072b6c4d239a546151f945c87f57bb7011e17f/cpu.cfs_quota_us
-1
可见没有对stress容器做cpu限制
top命令
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
47864 root 20 0 736 36 0 R 99.3 0.0 3:00.90 stress
cpu吃到约100%
- 对容器cpu做限制,--cpu-period=100000,--cpu-quota=60000
docker run -itd --name stress --cpu-period 100000 --cpu-quota 60000 polinux/stress:1.0.4 stress --cpu 1 --vm-bytes 200M
2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905
cat /sys/fs/cgroup/cpu/docker/2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905/cpu.cfs_period_us
100000
cat /sys/fs/cgroup/cpu/docker/2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905/cpu.cfs_quota_us
60000
top命令
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
48758 root 20 0 736 40 0 R 59.8 0.0 1:54.94 stress
cpu使用率约在60%左右
docker top stress
UID PID PPID C STIME TTY TIME CMD
root 48744 48732 0 17:18 pts/2 00:00:00 stress --cpu 1 --vm-bytes 200M
root 48758 48744 60 17:18 pts/2 00:02:48 stress --cpu 1 --vm-bytes 200M
cat /sys/fs/cgroup/cpu/docker/2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905/tasks
48744
48758
Docker容器内存限制实践
这里针对--memory,--memory-swap,--memory-swappiness三个参数做实践
创建容器,验证--memory、--memory-swap、--memory-swappiness分别对应的文件值
docker run -itd --name stress --memory 1G --memory-swap 3G --memory-swappiness 20 --entrypoint bash progrium/stress:1.0.1
7fedd3fa789e135497e7e8df40b9144d8220d24807eeab2c32618d1bd16c315c
ll /sys/fs/cgroup/memory/
total 0
-rw-r--r-- 1 root root 0 Oct 11 10:30 cgroup.clone_children
--w--w--w- 1 root root 0 Oct 11 10:30 cgroup.event_control
-rw-r--r-- 1 root root 0 Oct 11 10:30 cgroup.procs
-r--r--r-- 1 root root 0 Oct 11 10:30 cgroup.sane_behavior
drwxr-xr-x 3 root root 0 Jan 3 19:25 docker 多了一个docker目录
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.failcnt
--w------- 1 root root 0 Oct 11 10:30 memory.force_empty
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.failcnt
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.limit_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.slabinfo
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.tcp.failcnt
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 root root 0 Oct 11 10:30 memory.kmem.usage_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.memsw.failcnt
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Oct 11 10:30 memory.memsw.usage_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.move_charge_at_immigrate
-r--r--r-- 1 root root 0 Oct 11 10:30 memory.numa_stat
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.oom_control
---------- 1 root root 0 Oct 11 10:30 memory.pressure_level
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Oct 11 10:30 memory.stat
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.swappiness
-r--r--r-- 1 root root 0 Oct 11 10:30 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Oct 11 10:30 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Oct 11 10:30 notify_on_release
-rw-r--r-- 1 root root 0 Oct 11 10:30 release_agent
drwxr-xr-x 52 root root 0 Jan 3 14:16 system.slice
-rw-r--r-- 1 root root 0 Oct 11 10:30 tasks
drwxr-xr-x 2 root root 0 Oct 11 11:11 user.slice
ll /sys/fs/cgroup/memory/docker/
total 0
drwxr-xr-x 2 root root 0 Jan 3 19:30 7fedd3fa789e135497e7e8df40b9144d8220d24807eeab2c32618d1bd16c315c 容器id子目录
-rw-r--r-- 1 root root 0 Jan 3 15:23 cgroup.clone_children
--w--w--w- 1 root root 0 Jan 3 15:23 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan 3 15:23 cgroup.procs
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.failcnt
--w------- 1 root root 0 Jan 3 15:23 memory.force_empty
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.failcnt
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.slabinfo
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.tcp.failcnt
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 15:23 memory.kmem.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.memsw.failcnt
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 15:23 memory.memsw.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.move_charge_at_immigrate
-r--r--r-- 1 root root 0 Jan 3 15:23 memory.numa_stat
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.oom_control
---------- 1 root root 0 Jan 3 15:23 memory.pressure_level
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Jan 3 15:23 memory.stat
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.swappiness
-r--r--r-- 1 root root 0 Jan 3 15:23 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 15:23 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Jan 3 15:23 notify_on_release
-rw-r--r-- 1 root root 0 Jan 3 15:23 tasks
ll /sys/fs/cgroup/memory/docker/7fedd3fa789e135497e7e8df40b9144d8220d24807eeab2c32618d1bd16c315c/
total 0
-rw-r--r-- 1 root root 0 Jan 3 19:30 cgroup.clone_children
--w--w--w- 1 root root 0 Jan 3 19:30 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan 3 19:30 cgroup.procs
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.failcnt
--w------- 1 root root 0 Jan 3 19:30 memory.force_empty
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.failcnt
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.slabinfo
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.tcp.failcnt
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 19:30 memory.kmem.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.memsw.failcnt
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 3 19:30 memory.memsw.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.move_charge_at_immigrate
-r--r--r-- 1 root root 0 Jan 3 19:30 memory.numa_stat
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.oom_control
---------- 1 root root 0 Jan 3 19:30 memory.pressure_level
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Jan 3 19:30 memory.stat
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.swappiness
-r--r--r-- 1 root root 0 Jan 3 19:30 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan 3 19:30 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Jan 3 19:30 notify_on_release
-rw-r--r-- 1 root root 0 Jan 3 19:30 tasks
cat memory.limit_in_bytes
1073741824 对应--memory=1G
cat memory.memsw.limit_in_bytes
3221225472 对应--memory-swap=3G
cat memory.swappiness
20 对应--memory-swappiness 20
实践一(不设置--memory与--memory-swap)
- 当容器使用完宿主机内存与交换分区时,经过一段时间系统会因为oom kill容器
docker run -itd --name stress progrium/stress:1.0.1 --cpu 1 --vm 2 --vm-bytes 19.9999G
Jun 26 10:27:49 localhost kernel: Out of memory: Kill process 2518 (stress) score 498 or sacrifice child
Jun 26 10:27:49 localhost kernel: Killed process 2518 (stress) total-vm:19930256kB, anon-rss:8106200kB, file-rss:8kB
Jun 26 10:27:50 localhost dockerd: time="2018-06-26T10:27:49.971614112+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 10:27:50 localhost kernel: XFS (dm-3): Unmounting Filesystem
发生了oom killer事件
- 当容器使用完宿主机内存与交换分区时,如果容器已经设置了--oom-kill-disable=true,检测容器是否会因为oom被kill
docker run -itd --name stress --oom-kill-disable=true progrium/stress:1.0.1 --cpu 1 --vm 2 --vm-bytes 19.9999G
WARNING: Disabling the OOM killer on containers without setting a '-m/--memory' limit may be dangerous.
Jun 26 10:24:46 localhost kernel: Out of memory: Kill process 2399 (stress) score 503 or sacrifice child
Jun 26 10:24:46 localhost kernel: Killed process 2399 (stress) total-vm:19930256kB, anon-rss:8236728kB, file-rss:8kB
Jun 26 10:24:47 localhost dockerd: time="2018-06-26T10:24:47.314803845+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 10:24:47 localhost kernel: XFS (dm-3): Unmounting Filesystem
依然发生了oom killer的事件,猜测可能是类似于警告中说的没有设置--memory的原因
- 对容器设置memory限制且不使用swap,并设置--oom-kill-disable=true,验证当容器内的应用使用内存超出限制时,该容器是否会因为oom 被kill
docker run -itd --name stress --memory 5G --memory-swap 5G --oom-kill-disable=true progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 6G
docker stats stress 监控信息如下:经过一段时间,容器并没有发生oom,stats命令中的MEM %参数的值一直是100.00%
- 对上述3中的容器去掉 --oom-kill-disable=true 的设置,检测结果
docker run -itd --name stress --memory 5G --memory-swap 5G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 6G
Jun 26 10:48:37 localhost kernel: Memory cgroup out of memory: Kill process 6828 (stress) score 1001 or sacrifice child
Jun 26 10:48:37 localhost kernel: Killed process 6828 (stress) total-vm:6298768kB, anon-rss:5241352kB, file-rss:8kB
Jun 26 10:48:37 localhost kernel: XFS (dm-3): Unmounting Filesystem
Jun 26 10:48:37 localhost dockerd: time="2018-06-26T10:48:37.597444881+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
很快容器就发生了oom killer事件
可知容器只有在设置了memory限制之后,--oom-kill-disable才会起作用
实践二(设置--memory,不设置--memory-swap):
根据docker文档说明,这时容器可以使用到的最大的内存为2倍的memory的内存(memory=memory-swap, 2倍=memory+memory-swap)
docker run -itd --name stress --memory 1G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 2G
Jun 26 11:15:18 localhost kernel: Memory cgroup out of memory: Kill process 7333 (stress) score 1001 or sacrifice child
Jun 26 11:15:18 localhost kernel: Killed process 7333 (stress) total-vm:2104464kB, anon-rss:998344kB, file-rss:8kB
Jun 26 11:15:19 localhost dockerd: time="2018-06-26T11:15:19.085601671+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 11:15:19 localhost kernel: XFS (dm-3): Unmounting Filesystem
可见,依然发生了oom,因此容器能使用的内存需小于2倍memory(这里的2G)
docker run -itd --name stress --memory 1G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 1.99G
此时,经过一段时间的检测,并没有发生oom,docker stats stress:
实践三(设置--memory与--memory-swap)
由docker文档知,memory-swap等于memory+swap的和,因此要求memory-swap >= memory(Minimum memoryswap limit should be larger than memory limit)
docker run -itd --name stress --memory 1G --memory-swap 2G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 1.5G
由于memory=1G,因此1.5G=memory(1G) + swap(0.5G)
docker stats stress:MEM %的值也一直是在100.00%以内浮动,另外也可知LIMIT仅是memory的值
容器如果不想要使用宿主机的swap,则可将memory-swap的值设置与memory的值相等即可
docker run -itd --name stress --memory 1G --memory-swap 1G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 999M
设置memory-swap为-1,表示不限制容器使用的swap分区的大小
docker run -itd --name stress --memory 1G --memory-swap -1 progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 2G
使用了1Gmemory、1G的系统swap
当前实践的机器的swap为5G,因此这里设置用完5Gswap
docker run -itd --name stress --memory 1G --memory-swap -1 progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 6G
Jun 26 14:09:45 localhost kernel: Memory cgroup out of memory: Kill process 11335 (stress) score 991 or sacrifice child
Jun 26 14:09:45 localhost kernel: Killed process 11335 (stress) total-vm:6298768kB, anon-rss:1037064kB, file-rss:4kB
Jun 26 14:09:46 localhost dockerd: time="2018-06-26T14:09:46.073190546+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 14:09:46 localhost kernel: XFS (dm-3): Unmounting Filesystem
运行了一段时间,就会产生oom
设置使用系统的swap超出5G
docker run -itd --name stress --memory 1G --memory-swap -1 progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 8G
Jun 26 14:19:10 localhost kernel: Memory cgroup out of memory: Kill process 11654 (stress) score 992 or sacrifice child
Jun 26 14:19:10 localhost kernel: Killed process 11654 (stress) total-vm:8395920kB, anon-rss:1043232kB, file-rss:8kB
Jun 26 14:19:11 localhost dockerd: time="2018-06-26T14:19:11.196855155+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 14:19:11 localhost kernel: XFS (dm-3): Unmounting Filesystem
也是运行一段时间,产生oom
通过设置memory-swappiness参数使得容器禁用系统交换空间
docker run -itd --name stress --memory 1G --memory-swappiness 0 --entrypoint bash progrium/stress:1.0.1
root@node75:/# stress --vm 1 --vm-bytes 1G
stress: info: [28] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [28] (416) <-- worker 29 got signal 9
stress: WARN: [28] (418) now reaping child worker processes
stress: FAIL: [28] (452) failed run completed in 1s
Jun 26 17:53:40 localhost kernel: Memory cgroup out of memory: Kill process 27849 (stress) score 998 or sacrifice child
Jun 26 17:53:40 localhost kernel: Killed process 27849 (stress) total-vm:1055888kB, anon-rss:1045144kB, file-rss:0kB
结论
上述是对容器较为常用的cpu与内存的限制做了一些实践记录,了解了docker是如何通过linux cgroup对容器的资源做的限制。当然除了cpu、内存之外的其他资源的限制我们同样可以通过实践的方式一点一点去探索,求证。
引用
https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
https://coolshell.cn/articles/17049.html
https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/resource_management_guide/index