Good day all.
We have a server with 64G of memory.
We deployed servers with a total allocation of 62.464G. The remainder of the OS uses 285MB. This seemed more than enough memory for the setup. This has been working without any problems however In the last 2 weeks, we have been getting OOM Killer events on the server.
Upon investigation I have found that the RSS used by some of the VM's can be up to 107% of the allocation for the VM. For instance.
borin.internal is allocated to use 1024MB. The RSS of the process is using 1104.5703125MB
In total we have 62464MB allocated for VM's but RSS total for all qemu-system-x86_64 processes is 63003MB wich seems to obviously change with usage.
To avoid possible OOM killers in future, what is the recommend way of calculating real memory allocation per VM to ensure we don't have more OOM Kill situations ?
Regards
Henti