Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer

From:	Mark Cave-Ayland
Subject:	Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer
Date:	Tue, 2 Feb 2016 23:41:40 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0

On 01/02/16 00:52, David Gibson wrote:

>> Thanks for more pointers - I think I'm slowly getting there. My current
>> thoughts are that the basic migration algorithm is doing the right thing
>> in that it works out the number of host ticks different between source
>> and destination.
> 
> Sorry, I've take a while to reply to this.  I realised the tb
> migration didn't work the way I thought it did, so I've had to get my
> head around what's actually going on.

No problem - it's turning out to be a lot more complicated than I
initially expected.

> I had thought that it transferred only meta-information telling the
> destination how to calculate the timebase, without actually working
> out the timebase value at any particular moment.
> 
> In fact, what it sends is basically the tuple of (timebase, realtime)
> at the point of sending the migration stream.  The destination then
> uses that to work out how to compute the timebase from realtime there.
> 
> I'm not convinced this is a great approach, but it should basically
> work.  However, as you've seen there are also some Just Plain Bugs in
> the logic for this.
> 
>> I have a slight query with this section of code though:
>>
>>     migration_duration_tb = muldiv64(migration_duration_ns, freq,
>>                                      NANOSECONDS_PER_SECOND);
>>
>> This is not technically correct on TCG x86 since the timebase is the x86
>> TSC which is running somewhere in the GHz range, compared to freq which
>> is hard-coded to 16MHz.
> 
> Um.. what?  AFAICT that line doesn't have any reference to the TSC
> speed.  Just ns and the (guest) tb).  Also 16MHz is only for the
> oldworld Macs - modern ppc cpus have the TB frequency architected as
> 512MHz.

On TCG the software timebase for the Mac guests is fixed at 16MHz so how
does KVM handle this? Does it compensate by emulating the 16MHz timebase
for the guest even though the host has a 512HMz timebase?

>> However this doesn't seem to matter because the
>> timebase adjustment is limited to a maximum of 1s. Why should this be if
>> the timebase is supposed to be free running as you mentioned in a
>> previous email?
> 
> AFAICT, what it's doing here is assuming that if the migration
> duration is >1s (or appears to be >1s) then it's because the host
> clocks are out of sync and so just capping the elapsed tb time at 1s.
> 
> That's just wrong, IMO.  1s is a long downtime for a live migration,
> but it's not impossible, and it will happen nearly always in the
> scenariou you've discussed of manually loading the migration stream
> from a file.
> 
> But more to the point, trying to maintain correctness of the timebase
> when the hosts are out of sync is basically futile.  There's no other
> reference we can use, so all we can achieve is getting a different
> wrong value from what we'd get by blindly trusting the host clock.
> 
> We do need to constrain the tb from going backwards, because that will
> cause chaos on the guest, but otherwise we should just trust the host
> clock and ditch that 1s clamp.  If the hosts are out of sync, then
> guest time will jump, but that was always going to happen.

Going back to your earlier email you suggested that the host timebase is
always continuously running, even when the guest is paused. But then
resuming the guest then the timebase must jump in the guest regardless?

If this is the case then this is the big difference between TCG and KVM
guests: TCG timebase is derived from the virtual clock which solves the
problem of paused guests during migration. For example with the existing
migration code, what would happen if you did a migration with the guest
paused on KVM? The offset would surely be wrong as it was calculated at
the end of migration.

And another thought: should it be possible to migrate guests between TCG
and KVM hosts at will?

>> AFAICT the main problem on TCG x86 is that post-migration the timebase
>> calculated by cpu_ppc_get_tb() is incorrect:
>>
>> uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset)
>> {
>>     /* TB time in tb periods */
>>     return muldiv64(vmclk, tb_env->tb_freq, get_ticks_per_sec()) +
>>                     tb_offset;
>> }
> 
> 
> So the problem here is that get_ticks_per_sec() (which always returns
> 1,000,000,000) is not talking about the same ticks as
> cpu_get_host_ticks().  That may not have been true when this code was
> written.

Yes. That's basically what I was trying to say but I think you've
expressed it far more eloquently than I did.

>> For a typical savevm/loadvm pair I see something like this:
>>
>> savevm:
>>
>> tb->guest_timebase = 26281306490558
>> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
>>
>> loadvm:
>>
>> cpu_get_host_ticks() = 26289847005259
>> tb_off_adj = -8540514701
>> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511
>> cpu_ppc_get_tb() = -15785159386
>>
>> But as cpu_ppc_get_tb() uses QEMU_CLOCK_VIRTUAL for vmclk we end up with
>> a negative number for the timebase since the virtual clock is dwarfed by
>> the number of TSC ticks calculated for tb_off_adj. This will work on a
>> PPC host though since cpu_host_get_ticks() is also derived from the
>> timebase.
> 
> Yeah, we shouldn't be using cpu_host_get_ticks() at all - or anything
> else which depends on a host frequency.  We should only be using qemu
> interfaces which work in real time units (nanoseconds, usually).

I agree that this is the right way forward. Unfortunately the timebase
behaviour under KVM PPC is quite new to me, so please do bear with me
for asking all these questions.


ATB,

Mark.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, Mark Cave-Ayland <=
- Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, David Gibson, 2016/02/02
  - Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, Alexander Graf, 2016/02/03
  - Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, Mark Cave-Ayland, 2016/02/23
    - Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer, David Gibson, 2016/02/23
    - Re: [Qemu-ppc] Migrating decrementer, Juan Quintela, 2016/02/24
    - Re: [Qemu-ppc] Migrating decrementer, David Gibson, 2016/02/24
    - Re: [Qemu-ppc] Migrating decrementer, Mark Cave-Ayland, 2016/02/24
    - Re: [Qemu-ppc] Migrating decrementer, Mark Cave-Ayland, 2016/02/25
    - Re: [Qemu-ppc] Migrating decrementer, Mark Cave-Ayland, 2016/02/25
    - Re: [Qemu-ppc] Migrating decrementer, David Gibson, 2016/02/25

Prev by Date: Re: [Qemu-ppc] [Qemu-devel] [PATCH v6 2/5] util: Use new error_report_fatal/abort instead of error_setg(&error_fatal/abort)
Next by Date: Re: [Qemu-ppc] CPU hotplug
Previous by thread: Re: [Qemu-ppc] [Qemu-devel] [PATCH] dimm: Correct type of MemoryHotplugState->base
Next by thread: Re: [Qemu-ppc] [Qemu-devel] Migrating decrementer
Index(es):
- Date
- Thread