qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35


From: Daniel P . Berrangé
Subject: Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
Date: Thu, 10 Nov 2022 18:21:15 +0000
User-agent: Mutt/2.2.7 (2022-08-07)

On Thu, Nov 10, 2022 at 11:29:21AM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 01, 2022 at 01:03:17PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Nov 01, 2022 at 01:57:24PM +0100, Igor Mammedov wrote:
> > > On Mon, 31 Oct 2022 11:48:58 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote:
> > > > > On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:  
> > > > > > The TCO watchdog is unconditionally integrated into the Q35 machine
> > > > > > type by default, but at the same time is unconditionally disabled
> > > > > > from firing by a host config option that overrides guest OS attempts
> > > > > > to enable it. People have to know to set a magic -global to make
> > > > > > it non-broken  
> > > > > 
> > > > > Incidentally I found that originally the TCO watchdog was not
> > > > > unconditionally enabled. Its exposure to the guest could be
> > > > > turned on/off using
> > > > > 
> > > > >   -global ICH9-LPC.enable_tco=bool
> > > > > 
> > > > > This was implemented for machine type compat, but it also gave
> > > > > apps a way to disable the watchdog functionality. Unfortunately
> > > > > that ability was discarded in this series:
> > > > > 
> > > > >   
> > > > > https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
> > > > > 
> > > > > but the 'enable_tco' property still exists in QOM, but silently
> > > > > ignored.
> > > > > 
> > > > > Seems we should either fix the impl of 'enable_tco', or remove the
> > > > > QOM property entirely, so we don't pretend it can be toggled anymore.
> > > > > 
> > > > > With regards,
> > > > > Daniel  
> > > > 
> > > > i am inclined to say you are right and the fix is to fix the impl.
> > > 
> > > Is there need for users to disable whatchdog at all?
> > > It was always present since then and no one complained, 
> > > so perhaps we should ditch property instead fixing it
> > > to keep it simple.
> > 
> > Thinking about it more, I think we should NOT fix the 'enable_tco' property,
> > because there will be no way for a mgmt appp to tell if they're using a
> > fixed or broken QEMU.
> 
> This is always the case for any bug. We don't as a rule add properties
> for this, it is distro's responsibility to pick bugfixes for
> features users care about.
> 
> What makes this bug different?

It is impossible to safely use the enable_tco property because it
will break ABI across live migrations between old and new QEMU.
If it was a recent bug I'd be ok with just fixing it. Since this
property has been broken for 6 years though, IMHO it just needs
to be deleted entirely.

> > So if they use 'enable_tco' on a broken QEMU and then
> > live migrate, they'll get an guest ABI change. If we did want to support
> > disabling it, then we should have a brand new property that apps can probe
> > for.
> > 
> > In the absence of a request to disable watchdog, I'd say we just delete
> > 'enable_tco' right now. If someone wants it in future, we can add it with
> > a new name.
> 
> I am really stressed about watchdog firing and resetting VMs that previously
> were working fine. Adding this without a safety mechanism to quickly
> disable in case of problems in the field is not wise imho.

We would only toggle the ICH9-LPC.noreboot flag for new machine types, so
wouldn't affect existing VMs, and if anyone has a problem they can explicitly
set ICH9-LPC.noreboot back to true, or pick the older machine type again.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]