kvm guests get more RAM than they are allocated, making kvm-status mis-leading
I was de-bugging why a host was bringing out the oomkiller when kvm-status
indicated that we have allocated well below the total RAM in use on the server.
Here is the task state table of the oomkiller report:
Jul 19 12:22:16 elandria kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Jul 19 12:22:16 elandria kernel: [ 986] 0 986 864 101 40960 0 0 mdadm
Jul 19 12:22:16 elandria kernel: [ 1072] 0 1072 1410 67 49152 0 0 cron
Jul 19 12:22:16 elandria kernel: [ 1073] 104 1073 2133 282 53248 0 -900 dbus-daemon
Jul 19 12:22:16 elandria kernel: [ 1094] 0 1094 55200 540 77824 0 0 rsyslogd
Jul 19 12:22:16 elandria kernel: [ 1098] 0 1098 2998 465 65536 0 0 smartd
Jul 19 12:22:16 elandria kernel: [ 1132] 0 1132 3637 443 69632 0 0 systemd-logind
Jul 19 12:22:16 elandria kernel: [ 1145] 0 1145 719 28 45056 0 0 agetty
Jul 19 12:22:16 elandria kernel: [ 1278] 0 1278 3293 157 61440 0 0 systemd-tty-ask
Jul 19 12:22:16 elandria kernel: [ 1491] 0 1491 26544 2096 102400 0 0 unattended-upgr
Jul 19 12:22:16 elandria kernel: [ 2004] 1009 2004 1442926 272759 3108864 0 0 kvm
Jul 19 12:22:16 elandria kernel: [ 2027] 1009 2027 1544 209 53248 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 2054] 1012 2054 3286692 1740029 15151104 3229 0 kvm
Jul 19 12:22:16 elandria kernel: [ 2064] 1013 2064 3841467 2630655 22204416 1 0 kvm
Jul 19 12:22:16 elandria kernel: [ 2105] 1012 2105 1478 120 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 2117] 1013 2117 1478 120 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 2518] 1008 2518 1689348 535092 5287936 0 0 kvm
Jul 19 12:22:16 elandria kernel: [ 2523] 1008 2523 1577 209 61440 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 2761] 1008 2761 2199 142 61440 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 2762] 1013 2762 2199 152 57344 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 2763] 1009 2763 2199 141 57344 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 2770] 1012 2770 2199 142 53248 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 6006] 0 6006 1100 31 53248 0 0 agetty
Jul 19 12:22:16 elandria kernel: [ 12514] 1011 12514 2336389 1058161 9728000 96 0 kvm
Jul 19 12:22:16 elandria kernel: [ 12517] 1011 12517 1511 170 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 12547] 1011 12547 2199 141 57344 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 15767] 1006 15767 1697922 536580 5263360 0 0 kvm
Jul 19 12:22:16 elandria kernel: [ 15768] 1006 15768 1478 121 40960 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 15800] 1006 15800 2199 141 57344 0 0 socat
Jul 19 12:22:16 elandria kernel: [2596412] 998 2596412 596 24 45056 0 0 autossh
Jul 19 12:22:16 elandria kernel: [2596638] 998 2596638 33390 3301 143360 0 0 bruce-banner
Jul 19 12:22:16 elandria kernel: [2947793] 1015 2947793 2213770 1057913 9486336 17 0 kvm
Jul 19 12:22:16 elandria kernel: [2947794] 1015 2947794 1511 175 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [2947826] 1015 2947826 2200 153 49152 0 0 socat
Jul 19 12:22:16 elandria kernel: [2004117] 1017 2004117 2231563 1057424 9449472 0 0 kvm
Jul 19 12:22:16 elandria kernel: [2004118] 1017 2004118 1543 207 45056 0 0 screen
Jul 19 12:22:16 elandria kernel: [2004150] 1017 2004150 2198 142 61440 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 626257] 1010 626257 3163949 1853123 16510976 1092 0 kvm
Jul 19 12:22:16 elandria kernel: [ 626260] 1010 626260 1653 321 53248 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 626290] 1010 626290 2198 152 61440 0 0 socat
Jul 19 12:22:16 elandria kernel: [3589328] 1004 3589328 2271333 901896 8372224 1071 0 kvm
Jul 19 12:22:16 elandria kernel: [3589329] 1004 3589329 1477 122 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [3589361] 1004 3589361 2198 142 53248 0 0 socat
Jul 19 12:22:16 elandria kernel: [2803271] 1014 2803271 2356549 1060417 9809920 0 0 kvm
Jul 19 12:22:16 elandria kernel: [2803272] 1014 2803272 1543 204 53248 0 0 screen
Jul 19 12:22:16 elandria kernel: [2803304] 1014 2803304 2198 142 53248 0 0 socat
Jul 19 12:22:16 elandria kernel: [2607017] 0 2607017 3338 244 69632 0 -1000 sshd
Jul 19 12:22:16 elandria kernel: [1085896] 107 1085896 18644 200 57344 1 0 ntpd
Jul 19 12:22:16 elandria kernel: [ 411495] 108 411495 15754 11365 163840 0 0 unbound
Jul 19 12:22:16 elandria kernel: [2198242] 0 2198242 12715 7404 147456 0 0 python3
Jul 19 12:22:16 elandria kernel: [1940601] 1002 1940601 1758433 534284 5464064 0 0 kvm
Jul 19 12:22:16 elandria kernel: [1940602] 1002 1940602 1510 175 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [1940634] 1002 1940634 2198 152 53248 0 0 socat
Jul 19 12:22:16 elandria kernel: [1941391] 998 1941391 2393 222 57344 0 0 ssh
Jul 19 12:22:16 elandria kernel: [ 336366] 997 336366 1000186 9162 1085440 2 0 journalbeat
Jul 19 12:22:16 elandria kernel: [ 344961] 996 344961 966125 15154 933888 0 0 metricbeat
Jul 19 12:22:16 elandria kernel: [1741668] 1018 1741668 1728079 536449 5464064 0 0 kvm
Jul 19 12:22:16 elandria kernel: [1741669] 1018 1741669 1477 122 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [1741701] 1018 1741701 2198 142 61440 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 182181] 1016 182181 5471360 4205440 34840576 0 0 kvm
Jul 19 12:22:16 elandria kernel: [ 182182] 1016 182182 1510 173 45056 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 182214] 1016 182214 2198 142 61440 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 298944] 102 298944 4174 373 73728 0 0 systemd-network
Jul 19 12:22:16 elandria kernel: [ 299270] 0 299270 5329 210 65536 0 -1000 systemd-udevd
Jul 19 12:22:16 elandria kernel: [ 551955] 1001 551955 1518 151 49152 0 0 screen
Jul 19 12:22:16 elandria kernel: [ 551967] 1001 551967 2198 143 57344 0 0 socat
Jul 19 12:22:16 elandria kernel: [ 552058] 1001 552058 3893167 2631294 22564864 0 0 kvm
Jul 19 12:22:16 elandria kernel: [3118777] 0 3118777 18726 768 114688 0 -250 systemd-journal
Jul 19 12:22:16 elandria kernel: [3159044] 1005 3159044 1510 132 53248 0 0 screen
Jul 19 12:22:16 elandria kernel: [3159056] 1005 3159056 2198 152 57344 0 0 socat
Jul 19 12:22:16 elandria kernel: [3159183] 1005 3159183 13911260 5828211 48193536 0 0 kvm
Jul 19 12:22:16 elandria kernel: [3519612] 1003 3519612 1510 173 53248 0 0 screen
Jul 19 12:22:16 elandria kernel: [3519625] 1003 3519625 2198 141 53248 0 0 socat
Jul 19 12:22:16 elandria kernel: [3519711] 1003 3519711 9741625 5944532 48910336 0 0 kvm
I saved it as a file and ran:
cat page.table |tr -d '\[\]' | awk '{sum += $9} END {print sum}
which produced: 65278302 for the "total_vm" column.
Since that number is in 4k pages, I did another calculation to determine the total RAM in GB in use:
(65278302 * 4 * 1024) / 1024 / 1024 / 1024 | bc
Which totaled 249 - the machine technically has 256GB of RAM so that seems close enough to cause the oom killer to come out.
Then, I sorted to find the big users:
cat page.table |tr -d '[]' | awk '{print $9 " " $7 }' | sort -n | tail
'
And this got interesting:
0 jamie@liberace:/tmp/cdtemp.UTgAwK$ cat page.table |tr -d '[]' | awk '{print $9 " " $7 }' | sort -n | tail
2271333 1004
2336389 1011
2356549 1014
3163949 1010
3286692 1012
3841467 1013
3893167 1001
5471360 1016
9741625 1003
13911260 1005
0 jamie@liberace:/tmp/cdtemp.UTgAwK$
If I calculate the RAM in GB for the last few records I get:
- UID 1005 using 53GB (allocated 48GB)
- UID 1003 using 37GB (allocated 32GB)
- UID 1016 using 20GB (allocated 16GB)
- UID 1001 using 14GB (allocated 10GB)
- UID 1013 using 14GB (allocated 10GB)
Either my math is wrong, or the kernel is giving each kvm guest an extra 4 - 5 GB of RAM.
I'm not really sure if this has always been the case or if something is different on this machine?