kvm guests get more RAM than they are allocated, making kvm-status mis-leading

I was de-bugging why a host was bringing out the oomkiller when kvm-status indicated that we have allocated well below the total RAM in use on the server.

Here is the task state table of the oomkiller report:

Jul 19 12:22:16 elandria kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jul 19 12:22:16 elandria kernel: [    986]     0   986      864      101    40960        0             0 mdadm
Jul 19 12:22:16 elandria kernel: [   1072]     0  1072     1410       67    49152        0             0 cron
Jul 19 12:22:16 elandria kernel: [   1073]   104  1073     2133      282    53248        0          -900 dbus-daemon
Jul 19 12:22:16 elandria kernel: [   1094]     0  1094    55200      540    77824        0             0 rsyslogd
Jul 19 12:22:16 elandria kernel: [   1098]     0  1098     2998      465    65536        0             0 smartd
Jul 19 12:22:16 elandria kernel: [   1132]     0  1132     3637      443    69632        0             0 systemd-logind
Jul 19 12:22:16 elandria kernel: [   1145]     0  1145      719       28    45056        0             0 agetty
Jul 19 12:22:16 elandria kernel: [   1278]     0  1278     3293      157    61440        0             0 systemd-tty-ask
Jul 19 12:22:16 elandria kernel: [   1491]     0  1491    26544     2096   102400        0             0 unattended-upgr
Jul 19 12:22:16 elandria kernel: [   2004]  1009  2004  1442926   272759  3108864        0             0 kvm
Jul 19 12:22:16 elandria kernel: [   2027]  1009  2027     1544      209    53248        0             0 screen
Jul 19 12:22:16 elandria kernel: [   2054]  1012  2054  3286692  1740029 15151104     3229             0 kvm
Jul 19 12:22:16 elandria kernel: [   2064]  1013  2064  3841467  2630655 22204416        1             0 kvm
Jul 19 12:22:16 elandria kernel: [   2105]  1012  2105     1478      120    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [   2117]  1013  2117     1478      120    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [   2518]  1008  2518  1689348   535092  5287936        0             0 kvm
Jul 19 12:22:16 elandria kernel: [   2523]  1008  2523     1577      209    61440        0             0 screen
Jul 19 12:22:16 elandria kernel: [   2761]  1008  2761     2199      142    61440        0             0 socat
Jul 19 12:22:16 elandria kernel: [   2762]  1013  2762     2199      152    57344        0             0 socat
Jul 19 12:22:16 elandria kernel: [   2763]  1009  2763     2199      141    57344        0             0 socat
Jul 19 12:22:16 elandria kernel: [   2770]  1012  2770     2199      142    53248        0             0 socat
Jul 19 12:22:16 elandria kernel: [   6006]     0  6006     1100       31    53248        0             0 agetty
Jul 19 12:22:16 elandria kernel: [  12514]  1011 12514  2336389  1058161  9728000       96             0 kvm
Jul 19 12:22:16 elandria kernel: [  12517]  1011 12517     1511      170    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [  12547]  1011 12547     2199      141    57344        0             0 socat
Jul 19 12:22:16 elandria kernel: [  15767]  1006 15767  1697922   536580  5263360        0             0 kvm
Jul 19 12:22:16 elandria kernel: [  15768]  1006 15768     1478      121    40960        0             0 screen
Jul 19 12:22:16 elandria kernel: [  15800]  1006 15800     2199      141    57344        0             0 socat
Jul 19 12:22:16 elandria kernel: [2596412]   998 2596412      596       24    45056        0             0 autossh
Jul 19 12:22:16 elandria kernel: [2596638]   998 2596638    33390     3301   143360        0             0 bruce-banner
Jul 19 12:22:16 elandria kernel: [2947793]  1015 2947793  2213770  1057913  9486336       17             0 kvm
Jul 19 12:22:16 elandria kernel: [2947794]  1015 2947794     1511      175    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [2947826]  1015 2947826     2200      153    49152        0             0 socat
Jul 19 12:22:16 elandria kernel: [2004117]  1017 2004117  2231563  1057424  9449472        0             0 kvm
Jul 19 12:22:16 elandria kernel: [2004118]  1017 2004118     1543      207    45056        0             0 screen
Jul 19 12:22:16 elandria kernel: [2004150]  1017 2004150     2198      142    61440        0             0 socat
Jul 19 12:22:16 elandria kernel: [ 626257]  1010 626257  3163949  1853123 16510976     1092             0 kvm
Jul 19 12:22:16 elandria kernel: [ 626260]  1010 626260     1653      321    53248        0             0 screen
Jul 19 12:22:16 elandria kernel: [ 626290]  1010 626290     2198      152    61440        0             0 socat
Jul 19 12:22:16 elandria kernel: [3589328]  1004 3589328  2271333   901896  8372224     1071             0 kvm
Jul 19 12:22:16 elandria kernel: [3589329]  1004 3589329     1477      122    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [3589361]  1004 3589361     2198      142    53248        0             0 socat
Jul 19 12:22:16 elandria kernel: [2803271]  1014 2803271  2356549  1060417  9809920        0             0 kvm
Jul 19 12:22:16 elandria kernel: [2803272]  1014 2803272     1543      204    53248        0             0 screen
Jul 19 12:22:16 elandria kernel: [2803304]  1014 2803304     2198      142    53248        0             0 socat
Jul 19 12:22:16 elandria kernel: [2607017]     0 2607017     3338      244    69632        0         -1000 sshd
Jul 19 12:22:16 elandria kernel: [1085896]   107 1085896    18644      200    57344        1             0 ntpd
Jul 19 12:22:16 elandria kernel: [ 411495]   108 411495    15754    11365   163840        0             0 unbound
Jul 19 12:22:16 elandria kernel: [2198242]     0 2198242    12715     7404   147456        0             0 python3
Jul 19 12:22:16 elandria kernel: [1940601]  1002 1940601  1758433   534284  5464064        0             0 kvm
Jul 19 12:22:16 elandria kernel: [1940602]  1002 1940602     1510      175    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [1940634]  1002 1940634     2198      152    53248        0             0 socat
Jul 19 12:22:16 elandria kernel: [1941391]   998 1941391     2393      222    57344        0             0 ssh
Jul 19 12:22:16 elandria kernel: [ 336366]   997 336366  1000186     9162  1085440        2             0 journalbeat
Jul 19 12:22:16 elandria kernel: [ 344961]   996 344961   966125    15154   933888        0             0 metricbeat
Jul 19 12:22:16 elandria kernel: [1741668]  1018 1741668  1728079   536449  5464064        0             0 kvm
Jul 19 12:22:16 elandria kernel: [1741669]  1018 1741669     1477      122    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [1741701]  1018 1741701     2198      142    61440        0             0 socat
Jul 19 12:22:16 elandria kernel: [ 182181]  1016 182181  5471360  4205440 34840576        0             0 kvm
Jul 19 12:22:16 elandria kernel: [ 182182]  1016 182182     1510      173    45056        0             0 screen
Jul 19 12:22:16 elandria kernel: [ 182214]  1016 182214     2198      142    61440        0             0 socat
Jul 19 12:22:16 elandria kernel: [ 298944]   102 298944     4174      373    73728        0             0 systemd-network
Jul 19 12:22:16 elandria kernel: [ 299270]     0 299270     5329      210    65536        0         -1000 systemd-udevd
Jul 19 12:22:16 elandria kernel: [ 551955]  1001 551955     1518      151    49152        0             0 screen
Jul 19 12:22:16 elandria kernel: [ 551967]  1001 551967     2198      143    57344        0             0 socat
Jul 19 12:22:16 elandria kernel: [ 552058]  1001 552058  3893167  2631294 22564864        0             0 kvm
Jul 19 12:22:16 elandria kernel: [3118777]     0 3118777    18726      768   114688        0          -250 systemd-journal
Jul 19 12:22:16 elandria kernel: [3159044]  1005 3159044     1510      132    53248        0             0 screen
Jul 19 12:22:16 elandria kernel: [3159056]  1005 3159056     2198      152    57344        0             0 socat
Jul 19 12:22:16 elandria kernel: [3159183]  1005 3159183 13911260  5828211 48193536        0             0 kvm
Jul 19 12:22:16 elandria kernel: [3519612]  1003 3519612     1510      173    53248        0             0 screen
Jul 19 12:22:16 elandria kernel: [3519625]  1003 3519625     2198      141    53248        0             0 socat
Jul 19 12:22:16 elandria kernel: [3519711]  1003 3519711  9741625  5944532 48910336        0             0 kvm

I saved it as a file and ran:

cat page.table |tr -d '\[\]' | awk '{sum += $9} END {print sum}

which produced: 65278302 for the "total_vm" column.

Since that number is in 4k pages, I did another calculation to determine the total RAM in GB in use:

(65278302 * 4 * 1024) / 1024 / 1024 / 1024 | bc

Which totaled 249 - the machine technically has 256GB of RAM so that seems close enough to cause the oom killer to come out.

Then, I sorted to find the big users:

cat page.table |tr -d '[]' | awk '{print $9 " " $7 }' | sort -n | tail'

And this got interesting:

0 jamie@liberace:/tmp/cdtemp.UTgAwK$ cat page.table |tr -d '[]' | awk '{print $9 " " $7 }' | sort -n | tail
2271333 1004
2336389 1011
2356549 1014
3163949 1010
3286692 1012
3841467 1013
3893167 1001
5471360 1016
9741625 1003
13911260 1005
0 jamie@liberace:/tmp/cdtemp.UTgAwK$ 

If I calculate the RAM in GB for the last few records I get:

  • UID 1005 using 53GB (allocated 48GB)
  • UID 1003 using 37GB (allocated 32GB)
  • UID 1016 using 20GB (allocated 16GB)
  • UID 1001 using 14GB (allocated 10GB)
  • UID 1013 using 14GB (allocated 10GB)

Either my math is wrong, or the kernel is giving each kvm guest an extra 4 - 5 GB of RAM.

I'm not really sure if this has always been the case or if something is different on this machine?

Assignee Loading
Time tracking Loading