Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address

From:	Igor Mammedov
Subject:	Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.
Date:	Fri, 21 Dec 2018 15:13:25 +0100

On Thu, 20 Dec 2018 19:18:01 -0200
Eduardo Habkost <address@hidden> wrote:

> On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote:
> > On Wed, 19 Dec 2018 10:57:17 +0800
> > Yu Zhang <address@hidden> wrote:
> >   
> > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote:  
> > > > On Tue, 18 Dec 2018 17:27:23 +0800
> > > > Yu Zhang <address@hidden> wrote:
> > > >     
> > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote:    
> > > > > > On Wed, 12 Dec 2018 21:05:38 +0800
> > > > > > Yu Zhang <address@hidden> wrote:
> > > > > >     
> > > > > > > Currently, vIOMMU is using the value of IOVA address width, 
> > > > > > > instead of
> > > > > > > the host address width(HAW) to calculate the number of reserved 
> > > > > > > bits in
> > > > > > > data structures such as root entries, context entries, and 
> > > > > > > entries of
> > > > > > > DMA paging structures etc.
> > > > > > > 
> > > > > > > However values of IOVA address width and of the HAW may not 
> > > > > > > equal. For
> > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no 
> > > > > > > wider than
> > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit 
> > > > > > > may result
> > > > > > > in an invalid IOVA being accepted.
> > > > > > > 
> > > > > > > To fix this, a new field - haw_bits is introduced in struct 
> > > > > > > IntelIOMMUState,
> > > > > > > whose value is initialized based on the maximum physical address 
> > > > > > > set to
> > > > > > > guest CPU.    
> > > > > >     
> > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > > > > > > to clarify.
> > > > > > > 
> > > > > > > Signed-off-by: Yu Zhang <address@hidden>
> > > > > > > Reviewed-by: Peter Xu <address@hidden>
> > > > > > > ---    
> > > > > > [...]
> > > > > >     
> > > > > > > @@ -3100,6 +3104,8 @@ static void 
> > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
> > > > > > >  static void vtd_init(IntelIOMMUState *s)
> > > > > > >  {
> > > > > > >      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
> > > > > > > +    CPUState *cs = first_cpu;
> > > > > > > +    X86CPU *cpu = X86_CPU(cs);
> > > > > > >  
> > > > > > >      memset(s->csr, 0, DMAR_REG_SIZE);
> > > > > > >      memset(s->wmask, 0, DMAR_REG_SIZE);
> > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s)
> > > > > > >      s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
> > > > > > >               VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
> > > > > > >               VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
> > > > > > > -    if (s->aw_bits == VTD_HOST_AW_48BIT) {
> > > > > > > +    if (s->aw_bits == VTD_AW_48BIT) {
> > > > > > >          s->cap |= VTD_CAP_SAGAW_48bit;
> > > > > > >      }
> > > > > > >      s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
> > > > > > > +    s->haw_bits = cpu->phys_bits;    
> > > > > > Is it possible to avoid accessing CPU fields directly or cpu 
> > > > > > altogether
> > > > > > and set phys_bits when iommu is created?    
> > > > > 
> > > > > Thanks for your comments, Igor.
> > > > > 
> > > > > Well, I guess you prefer not to query the CPU capabilities while 
> > > > > deciding
> > > > > the vIOMMU features. But to me, they are not that irrelevant.:)
> > > > > 
> > > > > Here the hardware address width in vt-d, and the one in 
> > > > > cpuid.MAXPHYSADDR
> > > > > are referring to the same concept. In VM, both are the maximum guest 
> > > > > physical
> > > > > address width. If we do not check the CPU field here, we will still 
> > > > > have to
> > > > > check the CPU field in other places such as build_dmar_q35(), and 
> > > > > reset the
> > > > > s->haw_bits again.
> > > > > 
> > > > > Is this explanation convincing enough? :)    
> > > > current build_dmar_q35() doesn't do it, it's all new code in this 
> > > > series that
> > > > contains not acceptable direct access from one device (iommu) to 
> > > > another (cpu).   
> > > > Proper way would be for the owner of iommu to fish limits from 
> > > > somewhere and set
> > > > values during iommu creation.    
> > > 
> > > Well, current build_dmar_q35() doesn't do it, because it is using the 
> > > incorrect value. :)
> > > According to the spec, the host address width is the maximum physical 
> > > address width,
> > > yet current implementation is using the DMA address width. For me, this 
> > > is not only
> > > wrong, but also unsecure. For this point, I think we all agree this need 
> > > to be fixed.
> > > 
> > > As to how to fix it - should we query the cpu fields, I still do not 
> > > understand why
> > > this is not acceptable. :)
> > > 
> > > I had thought of other approaches before, yet I did not choose:
> > >   
> > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu 
> > > to limit its    
> > > physical address width(similar to the "x-aw-bits" for IOVA). But what 
> > > should we check
> > > this parameter or not? What if this parameter is set to sth. different 
> > > than the "phys-bits"
> > > or not?
> > >   
> > > 2> Another choice I had thought of is, to query the physical iommu. I 
> > > abandoned this    
> > > idea because my understanding is that vIOMMU is not a passthrued device, 
> > > it is emulated.  
> >   
> > > So Igor, may I ask why you think checking against the cpu fields so not 
> > > acceptable? :)  
> > Because accessing private fields of device from another random device is 
> > not robust
> > and a subject to breaking in unpredictable manner when field meaning or 
> > initialization
> > order changes. (analogy to baremetal: one does not solder wire to a CPU die 
> > to let
> > access some piece of data from random device).
> >   
> 
> With either the solution below or the one I proposed, we still
> have a ordering problem: if we want "-cpu ...,phys-bits=..." to
As Michael said, it's questionable if iommu should rely on guest's
phys-bits at all, but that aside we should use proper interfaces
and hierarchy to initialize devices, see below why I dislike
simplistic pc_max_phys_bits().

> affect the IOMMU device, we will need the CPU objects to be
> created before IOMMU realize.
> 
> At least both proposals make the initialization ordering
> explicitly a responsibility of the machine code.  In either case,
> I don't think we will start creating all CPU objects after device
> realize any time soon.
> 
> 
> > I've looked at intel-iommu code and how it's created so here is a way to do 
> > the thing
> > you need using proper interfaces:
> > 
> > 1. add x-haw_bits property
> > 2. include in your series patch
> >     '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override  
> > bus hotplug handler'
> > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to
> >    machine and let _pre_plug handler to check and set x-haw_bits for 
> > machine level  
> 
> Wow, that's a very complex way to pass a single integer from
> machine code to device code.  If this is the only way to do that,
> we really need to take a step back and rethink our API design.
> 
> What's wrong with having a simple
>   uint32_t pc_max_phys_bits(PCMachineState*)
> function?
As suggested, it would be only aesthetic change for accessing first_cpu from
random device at random time. IOMMU would still access cpu instance directly
no matter how much wrappers one would use so it's still the same hack.
If phys_bits were changing during VM lifecycle and iommu needed to use
updated value then using pc_max_phys_bits() might be justified as
we don't have interfaces to handle that but that's not the case here.

I suggested a typical way (albeit a bit complex) to handle device
initialization in cases where bus plug handler is not sufficient.
It follows proper hierarchy without any layer violations and can fail
gracefully even if we start creating CPUs later using only '-device cpufoo'
without need to fix iommu code to handle that (it would fail creating
iommu with clear error that CPU isn't available and all user have to
do is to fix CLI to make sure that CPU is created before iommu).

So I'd prefer if we used exiting pattern for device initialization
instead of hacks whenever it is possible.

> 
> > 4. you probably can use phys-bits/host-phys-bits properties to get data 
> > that you need
> >    also see how ms->possible_cpus, that's how you can get access to CPU 
> > from machine
> >    layer.
> >   
> [...]
> 
PS:
Another thing I'd like to draw your attention to (since you recently looked at
phys-bits) is about host/guest phys_bits and if it's safe from migration pov
between hosts with different limits.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width., (continued)

Prev by Date: Re: [Qemu-devel] [PATCH v2 00/12] spice: add -display spice-app, setup and launch a Spice client
Next by Date: Re: [Qemu-devel] [RFC PATCH] fpu: add compile time check for old glibc/libm and fma
Previous by thread: Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.
Next by thread: Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.
Index(es):
- Date
- Thread