qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: microvm machine: interrupts not received after initial virtqueue kic


From: Sergio Lopez
Subject: Re: microvm machine: interrupts not received after initial virtqueue kick as expected
Date: Fri, 15 Nov 2019 08:12:16 +0100
User-agent: mu4e 1.2.0; emacs 26.2

Waldek Kozaczuk <address@hidden> writes:

> Hi,
>
> Thanks for your analysis. It has definitely helped. With the right fixes,
> OSv can now properly boot on microvm with both block and networking devices.
>
> qemu-system-x86_64 -M microvm,x-option-roms=off,pit=off,pic=off,rtc=off \
>    -enable-kvm -cpu host,+x2apic -m 64m -smp 1 \
>    -nodefaults -no-user-config -nographic -no-reboot \
>    -kernel loader.elf -append "--nopci /hello" \
>    -gdb tcp::1234,server,nowait \
>    -serial stdio \
>    -global virtio-mmio.force-legacy=off \
>    -device virtio-blk-device,id=blk0,drive=hd0,scsi=off \
>    -drive file=usr.img,if=none,id=hd0,cache=none
>
> OSv v0.54.0-34-g55403bc8
> Booted up in 3.56 ms
> Cmdline: /hello
> Hello from C code
>
> real 0m0.051s
> user 0m0.039s
> sys 0m0.008s
>
> On average it takes around 4ms to boot OSv with a single block device. On
> average the same example boots 20% faster than on firecracker which is
> great!!!
>
> On another hand if I boot OSv image with a disk with ZFS filesystem on it
> it takes 350-550 ms to boot (mostly spent on mounting the filesystem which
> apparently requires far more interactions/exists than trivial ROFS). The
> lower number happens when the image is raw and higher when image is qcow2.
>
> To compare with firecracker, the exact same OSv raw image boots in ~ 60ms -
> which is 6-9 times faster than microvm. What could be the culprit?

If my memory serves me well, Firecracker doesn't open the disk images
with O_DIRECT, nor it flushes the disk after each write. So, for a fair
comparison, you should probably use 'cache=writeback' (or even
'cache=unsafe'?) on your drive definitions.

Regards,
Sergio.

> On Wed, Nov 13, 2019 at 5:53 AM Sergio Lopez <address@hidden> wrote:
>
>>
>> Waldek Kozaczuk <address@hidden> writes:
>>
>> > Hi,
>> >
>> > In the last couple of days, I have been trying to run OSv (
>> > https://github.com/cloudius-systems/osv) on new microvm machine and I
>> came
>> > across a problem I need some advice to solve.
>> >
>> > In short, when I try to boot OSv with a single block device, it properly
>> > detects and completes the 8-steps initialization phase per virtio spec.
>> > However, it gets stuck after the fist virtqueue kick when trying to read
>> > the partition block from disk. It appears that OSv never receives
>> > relevant interrupt and ends up waiting forever. At least this is my
>> > understanding.
>> >
>> > Here is the QEMU command line with all parameters and some diagnostic
>> > output from OSv (I added some extra debug statements to OSv):
>> >
>> > qemu-system-x86_64 -M microvm,x-option-roms=off,pit=off,pic=off,rtc=off \
>> >    -enable-kvm -cpu host,+x2apic -m 64m -smp 1 \
>> >    -kernel loader.elf -append "--nopci --verbose /hello" \
>> >    -nodefaults -no-user-config -nographic -no-reboot \
>> >    -gdb tcp::1234,server,nowait \
>> >    -serial stdio \
>> >    -device virtio-blk-device,id=blk0,drive=hd0,scsi=off \
>> >    -drive file=usr.img,if=none,id=hd0,cache=none
>> >
>> > OSv v0.54.0-29-g3357ae07
>> > CmdLine: --nopci --verbose /hello virtio_mmio.device=512@0xc0000e00:12
>> > 1 CPUs detected
>> > Firmware vendor: Unknown
>> > bsd: initializing - done
>> > VFS: mounting ramfs at /
>> > VFS: mounting devfs at /dev
>> > net: initializing - done
>> > Detected virtio-mmio device: (2,17745)
>> > -> virtio_driver: device initialized
>> > -> virtio_driver: device reset
>> > -> virtio_driver: device acknowledged
>> > VIRTIO BLK INSTANCE 0
>> > -> virtio_driver: device is modern
>> > blk config: seg_max: 126
>> > blk config: geometry: 1058013192
>> > blk config: block size: 512
>> > blk config: wce: 0
>> > Queue[0] -> size 1024, paddr d23000
>> > -> virtio_driver: detected 1 queues
>> > blk: device setup with interrupt irq: 12!
>> > !! Kicking queue: 0 at 0xffff8000c0000e50
>> >
>> > And also part of the stack trace on OSv side that may help illustrate the
>> > problem:
>> > (gdb) osv thread 26
>> > (gdb) bt
>> > #0  sched::thread::switch_to (this=this@entry=0xffff8000001a7040) at
>> > arch/x64/arch-switch.hh:108
>> > #1  0x00000000403f7164 in sched::cpu::reschedule_from_interrupt
>> > (this=0xffff80000000a040,
>> >     called_from_yield=called_from_yield@entry=false, preempt_after=...,
>> > preempt_after@entry=...) at core/sched.cc:339
>> > #2  0x00000000403f765c in sched::cpu::schedule () at
>> > include/osv/sched.hh:1310
>> > #3  0x00000000403f7d42 in sched::thread::wait
>> > (this=this@entry=0xffff800000d1c040)
>> > at core/sched.cc:1214
>> > #4  0x000000004041fa92 in sched::thread::do_wait_for<lockfree::mutex,
>> > sched::wait_object<waitqueue> > (mtx=...)
>> >     at include/osv/mutex.h:41
>> > #5  sched::thread::wait_for<waitqueue&> (mtx=...) at
>> > include/osv/sched.hh:1220
>> > #6  waitqueue::wait (this=this@entry=0xffffa000008f9ea0, mtx=...) at
>> > core/waitqueue.cc:56
>> > #7  0x0000000040435ca3 in bio_wait (bio=bio@entry=0xffffa000008f9e00) at
>> > fs/vfs/kern_physio.cc:42
>> > #8  0x0000000040436449 in rw_buf (bp=bp@entry=0x408a71c0 <buf_table>,
>> > rw=rw@entry=0) at fs/vfs/vfs_bio.cc:133
>> > #9  0x00000000404368a2 in bread (dev=dev@entry=0xffffa00000cba100,
>> > blkno=blkno@entry=0, bpp=bpp@entry=0x200000100638)
>> >     at fs/vfs/vfs_bio.cc:236
>> > #10 0x000000004043e96f in read_partition_table
>> > (dev=dev@entry=0xffffa00000cba100)
>> > at fs/devfs/device.cc:110
>> > #11 0x000000004037c2db in virtio::blk::blk (this=0xffffa000008e5c00,
>> > virtio_dev=...) at drivers/virtio-blk.cc:174
>> > #12 0x000000004037c439 in virtio::probe<virtio::blk, (unsigned short)2>
>> > (dev=<optimized out>) at ./drivers/device.hh:33
>> > #13 0x00000000403657ef in std::function<hw::hw_driver*
>> > (hw::hw_device*)>::operator()(hw::hw_device*) const (
>> >     __args#0=<optimized out>, this=0x200000100820) at
>> > /usr/include/c++/8/bits/std_function.h:682
>> > #14 hw::driver_manager::<lambda(hw::hw_device*)>::operator()
>> > (__closure=0x2000001008c0, dev=0xffffa000008eb980)
>> >     at drivers/driver.cc:40
>> > #15 std::_Function_handler<void(hw::hw_device*),
>> > hw::driver_manager::load_all()::<lambda(hw::hw_device*)>
>> >::_M_invoke(const
>> > std::_Any_data &, hw::hw_device *&&) (__functor=..., __args#0=<optimized
>> > out>)
>> >     at /usr/include/c++/8/bits/std_function.h:297
>> > #16 0x0000000040362ae3 in std::function<void
>> > (hw::hw_device*)>::operator()(hw::hw_device*) const (
>> >     __args#0=<optimized out>, this=0x2000001008c0) at
>> > /usr/include/c++/8/bits/std_function.h:682
>> > #17 hw::device_manager::for_each_device(std::function<void
>> > (hw::hw_device*)>) (this=this@entry=0xffffa000008eb940,
>> >     func=...) at drivers/device.cc:67
>> > #18 0x00000000403655a3 in hw::driver_manager::load_all
>> > (this=this@entry=0xffffa000008eb900)
>> > at /usr/include/c++/8/new:169
>> > #19 0x000000004038bffd in arch_init_drivers () at
>> arch/x64/arch-setup.cc:293
>> > #20 0x000000004022b06b in do_main_thread (_main_args=0xffffa000008c40b2)
>> at
>> > loader.cc:365
>> > #21 0x000000004045b7c6 in
>> pthread_private::pthread::<lambda()>::operator()
>> > (__closure=0xffffa000008f9d00)
>> >     at libc/pthread.cc:114
>> > #22 std::_Function_handler<void(),
>> pthread_private::pthread::pthread(void*
>> > (*)(void*), void*, sigset_t, const
>> > pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
>> > std::_Any_data &) (__functor=...)
>> >     at /usr/include/c++/8/bits/std_function.h:297
>> > #23 0x00000000403f8ae7 in sched::thread_main_c (t=0xffff800000d1c040) at
>> > arch/x64/arch-switch.hh:321
>> > #24 0x000000004039e7d3 in thread_main () at arch/x64/entry.S:113
>> > (gdb) osv thread 27
>> > (gdb) bt
>> > #0  sched::thread::switch_to (this=0xffff8000001a7040,
>> > this@entry=0xffff800000d1c040)
>> > at arch/x64/arch-switch.hh:108
>> > #1  0x00000000403f7164 in sched::cpu::reschedule_from_interrupt
>> > (this=0xffff80000000a040,
>> >     called_from_yield=called_from_yield@entry=false, preempt_after=...,
>> > preempt_after@entry=...) at core/sched.cc:339
>> > #2  0x00000000403f765c in sched::cpu::schedule () at
>> > include/osv/sched.hh:1310
>> > #3  0x00000000403f7d42 in sched::thread::wait
>> > (this=this@entry=0xffff800000d2e040)
>> > at core/sched.cc:1214
>> > #4  0x000000004036c1c1 in
>> > sched::thread::do_wait_until<sched::noninterruptible,
>> > sched::thread::dummy_lock,
>> > virtio::virtio_driver::wait_for_queue(virtio::vring*, bool
>> > (virtio::vring::*)() const)::<lambda()> > (mtx=<synthetic pointer>...,
>> >     pred=...) at drivers/virtio.cc:184
>> > #5
>> >
>> sched::thread::wait_until<virtio::virtio_driver::wait_for_queue(virtio::vring*,
>> > bool (virtio::vring::*)() const)::<lambda()> > (pred=...) at
>> > include/osv/sched.hh:1077
>> > #6  virtio::virtio_driver::wait_for_queue (this=this@entry
>> =0xffffa000008e5c00,
>> > queue=queue@entry=0xffffa000008f9c00,
>> >     pred=<optimized out>) at drivers/virtio.cc:168
>> > #7  0x000000004037b7c5 in virtio::blk::req_done (this=<optimized out>) at
>> > drivers/virtio-blk.cc:238
>> > #8  0x000000004037ba6c in virtio::blk::<lambda()>::operator()
>> > (__closure=<optimized out>) at drivers/virtio-blk.cc:132
>> > #9  std::_Function_handler<void(),
>> > virtio::blk::blk(virtio::virtio_device&)::<lambda()> >::_M_invoke(const
>> > std::_Any_data &) (__functor=...) at
>> > /usr/include/c++/8/bits/std_function.h:297
>> > #10 0x00000000403f8ae7 in sched::thread_main_c (t=0xffff800000d2e040) at
>> > arch/x64/arch-switch.hh:321
>> > #11 0x000000004039e7d3 in thread_main () at arch/x64/entry.S:113
>> >
>> > For comparison, when I run the same image with a single block device on
>> > firecracker it works fine and the OSv output looks like this:
>> >
>> > OSv v0.54.0-29-g3357ae07
>> > CmdLine: --verbose --nopci /hello virtio_mmio.device=4K@0xd0000000:5
>> > 1 CPUs detected
>> > Firmware vendor: Unknown
>> > bsd: initializing - done
>> > VFS: mounting ramfs at /
>> > VFS: mounting devfs at /dev
>> > net: initializing - done
>> > Detected virtio-mmio device: (2,0)
>> > -> virtio_driver: device initialized
>> > -> virtio_driver: device reset
>> > -> virtio_driver: device acknowledged
>> > VIRTIO BLK INSTANCE 0
>> > setup_features: found feature intersec of bit 9
>> > setup_features: found feature intersec of bit 41
>> > -> virtio_driver: device is modern
>> > Queue[0] -> size 256, paddr d23000
>> > -> virtio_driver: detected 1 queues
>> > blk: device setup with interrupt irq: 5!
>> > !! Kicking queue: 0 at 0xffff8000d0000050
>> > ------> Acked
>> > virtio-blk: Add blk device instances 0 as vblk0, devsize=4360192
>> > random: intel drng, rdrand registered as a source.
>> > random: <Software, Yarrow> initialized
>> > VFS: unmounting /dev
>> > VFS: mounting rofs at /rofs
>> >     virtio features:  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
>> 0
>> >  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  1
>> 0
>> >  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1
>> > !! Kicking queue: 0 at 0xffff8000d0000050
>> > ------> Acked
>> > !! Kicking queue: 0 at 0xffff8000d0000050
>> > ------> Acked
>> > !! Kicking queue: 0 at 0xffff8000d0000050
>> > ------> Acked
>> > VFS: mounting devfs at /dev
>> > VFS: mounting procfs at /proc
>> > VFS: mounting sysfs at /sys
>> > VFS: mounting ramfs at /tmp
>> > Booted up in 27.11 ms
>> > Cmdline: /hello
>> > !! Kicking queue: 0 at 0xffff8000d0000050
>> > ------> Acked
>> > !! Kicking queue: 0 at 0xffff8000d0000050
>> > ------> Acked
>> > Hello from C code
>> > VFS: unmounting /dev
>> > VFS: unmounting /proc
>> > VFS: unmounting /
>> > ROFS: spent 6.52 ms reading from disk
>> > ROFS: read 64 512-byte blocks from disk
>> > ROFS: allocated 61 512-byte blocks of cache memory
>> > ROFS: hit ratio is 93.02%
>> > Powering off.
>> >
>> > In the end, it makes me think there is some issue related to interrupts.
>> > Either wrong ones are used or misconfigured on OSv side or something else
>> > altogether.
>> >
>> > Like on firecracker OSv sets up GSI edge interrupts for QEMU microvm
>> > virtio-mmio devices but maybe it needs to use GSI level ones? Also, OSv
>> > works just fine with a traditional QEMU machine and PCI devices.
>> >
>> > Lastly, I tried to enable microvm pit and pic options - it did not make
>> any
>> > difference.
>> >
>> > Any suggestions on what might be wrong? What would be the best way to
>> debug
>> > it on QEMU side?
>>
>> I gave OSv a quick try (it's been a while since the last time I played
>> with it), and noticed this:
>>
>> OSv v0.54.0-26-gcccc7d40
>> 1 CPUs detected
>> Firmware vendor: Unknown
>> bsd: initializing - done
>> VFS: mounting ramfs at /
>> VFS: mounting devfs at /dev
>> net: initializing - done
>> Detected virtio-mmio device: (2,17745)
>>
>> Thread 1 "qemu-system-x86" hit Breakpoint 1, virtio_queue_notify_aio_vq
>> (vq=0x555556ea4260)
>>     at /root/qemu/hw/virtio/virtio.c:2232
>> 2232        bool ret = false;
>> (gdb) n
>> 2234        if (vq->vring.desc && vq->handle_aio_output) {
>> (gdb)
>> 2245        return ret;
>> (gdb) p vq->vring.desc
>> $1 = 0
>> (gdb) p vq->vring
>> $2 = {num = 1024, num_default = 128, align = 4096, desc = 0, avail = 0,
>> used = 0, caches = 0x0}
>>
>> As you can see, QEMU receives the kick, but the VQ wasn't properly
>> initialized, so it will bail out instead of attempting to process the
>> request.
>>
>> To get to this point, I've run QEMU with gdb and put a breakpoint on
>> virtio_queue_notify_aio_vq (you may need to rebuild with CFLAGS="-O0 -g"
>> to be able to follow the execution flow easily).
>>
> As you hinted the issue was in the configuration/initialization phase of
> the virtqueues. Let me explain what went wrong in the context of this
> virtio spec snippet:
>
> "4.2.3.2 Virtqueue Configuration
>
> The driver will typically initialize the virtual queue in the following way:
>
>    1. Select the queue writing its index (first queue is 0) to QueueSel.
>    2. Check if the queue is not already in use: read QueueReady, and expect
>    a returned value of zero (0x0).
>    3. Read maximum queue size (number of elements) from QueueNumMax. If the
>    returned value is zero (0x0) the queue is not available.
>    4. Allocate and zero the queue pages, making sure the memory is
>    physically contiguous. It is recommended to align the Used Ring to an
>    optimal boundary (usually the page size).
>    5. Notify the device about the queue size by writing the size to QueueNum
>    .
>    6. Write physical addresses of the queue’s Descriptor Table, Available
>    Ring and Used Ring to (respectively) the QueueDescLow/QueueDescHigh,
>    QueueAvailLow/QueueAvailHigh and QueueUsedLow/QueueUsedHigh register
>    pairs.
>    7. Write 0x1 to QueueReady."
>
> So, in essence, OSv would iterate over all virtqueues until step 3 yields
> 0. BUT in each iteration, it would ONLY perform steps 1-to-6. After it
> discovered all queues, OSv would iterate over the identified queues again
> and execute steps 1, 2 and 7 to activate them. This apparently is tolerated
> by firecracker and as matter of fact the spec above says "driver will
> *typically* initialize" which seems to indicate that it is OK to initialize
> the virtqueus like OSv does. But apparently QEMU does not like it. So
> instead I have changed this logic in OSv to adhere to what "driver will
> *typically* initialize" sequence (1..7 in each iteration) and now it all
> works. Firecracker also accepts it.
>
> In the end, the issue is that OSv doesn't support non-legacy virtio-mmio
>> devices (despite the check at "drivers/virtio-mmio.cc:121" does allow
>> for "version == 2" ;-), as described in version 1.0 of the VirtIO spec,
>> and microvm defaults to that model. You can work around the issue by
>> adding "-global virtio-mmio.force-legacy=off" to the QEMU command line,
>> but please consider implementing support for the non-legacy personality
>> (it's quite straightforward).
>>
>
> I think it is the other way - when I start QEMU without ' -global
> virtio-mmio.force-legacy=off' and add a debug statement to print the virtio
> version from the mmio configuration register, it shows "1" which indicates
> the legacy personality.
> On another hand, if I add  ' -global virtio-mmio.force-legacy=off' it
> prints "2" which indicates the modern personality.
> OSv mmio implementation only handles the modern personality as firecracker
> does. QEMU microvm seems to be capable of presenting both.
> But you are right on the account that OSv should not happily accept both 1
> and 2, where in fact it only supports 2. And adding support of the legacy
> personality can be added to OSv fairly easily but is it worth it given it
> is legacy one?.
> Also relatedly, I know there are parameters that could be added to PCI
> devices to make them behave as legacy, modern or transitional, for example
> - 'disable-legacy=off,disable-modern=on". Do the same parameters work with
> mmio devices with microvm machine? Or  is ' -global
> virtio-mmio.force-legacy=off' the only way to turn legacy or modern
> personality?
>
> Sergio.
>>
>>

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]