Hello
we are having a problem in our host, all the vm running on them
suddenly, and for some seconds, lost network connectivity.
the root cause appears to be the increase of irb-tlb from low values
(less than 20) to more than >100k, that spike only last for some seconds
then everything goes back to normal
i've upload an screenshot of collectd for the hypervisor here
http://zumbi.com.ar/tmp/irq-tlb.png
we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and
kernel 3.13) where the issue is frequent. also we have an small % of our
fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel
3.16) where the problem seemed to be nonexistent until today :(
issue seems to be isolated to < 10% of our hypervisors, some hypervisors
had this problem every few days, others only once or twice. our vm are a
black box to us we don't know what users run on them, but mostly cpu and
network bound workload.
most of our guests run centos 6.5 (kernel 2.6.32)
vm are bridged to a linuxbridge then veth wired to an ovs switch
(neutron openvswitch agent setup)
as anyone seen this before? as anyone fixed it?
thanks!