Thank you for finding this and fixing it. This issue has been giving us grief for months, and this patch appears to resolve the problem.
In our case, it seemed to have much greater severity with the RHEL / CentOS 7.x Linux 3.10 kernel when tied to SolidFire iSCSI based storage. This caused it to escape notice in our original soak period, and is likely a contributor to why others didn't encounter the problem. However, I believe this looks like a serious problem that could affect any guest machine that does a large amount of I/O. I believe the SolidFire connection may be that the I/O can queue up more easily than the local NVMe storage we also use, and there could be something related to the SolidFire QoS re-balancing where the iSCSI connection may be re-negotiated from time to time. So, I think this is more like "happens in some environments more than others", and unfortunately it happened a lot in one of our environments. :-(