[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-stable] [PATCH v4] exec: Fix non-power-of-2 sized accesses
From: |
Alex Williamson |
Subject: |
Re: [Qemu-stable] [PATCH v4] exec: Fix non-power-of-2 sized accesses |
Date: |
Sat, 17 Aug 2013 09:14:40 -0600 |
On Sat, 2013-08-17 at 10:23 +0200, Laszlo Ersek wrote:
> On 08/16/13 23:58, Alex Williamson wrote:
> > Since commit 23326164 we align access sizes to match the alignment of
> > the address, but we don't align the access size itself. This means we
> > let illegal access sizes (ex. 3) slip through if the address is
> > sufficiently aligned (ex. 4). This results in an abort which would be
> > easy for a guest to trigger. Account for aligning the access size.
> >
> > Signed-off-by: Alex Williamson <address@hidden>
> > Cc: address@hidden
> > ---
> >
> > v4: KISS
> > v3: Highest power of 2, not lowest
> > v2: Remove unnecessary loop condition
> >
> > exec.c | 18 +++++++++++++-----
> > 1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/exec.c b/exec.c
> > index 3ca9381..67a822c 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr,
> > unsigned l, hwaddr addr)
> > }
> > }
> >
> > - /* Don't attempt accesses larger than the maximum. */
> > - if (l > access_size_max) {
> > - l = access_size_max;
> > + /* Don't attempt accesses larger than the maximum or unsupported
> > sizes. */
> > + if (l >= access_size_max) {
> > + return access_size_max;
> > + } else {
> > + if (l >= 8) {
> > + return 8;
> > + } else if (l >= 4) {
> > + return 4;
> > + } else if (l >= 2) {
> > + return 2;
> > + } else {
> > + return 1;
> > + }
> > }
> > -
> > - return l;
> > }
> >
> > bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> >
>
> Considering that each block contains a return statement, I'd drop the
> else's:
>
> if (l >= access_size_max) {
> return access_size_max;
> }
> if (l >= 8) {
> return 8;
> }
> if (l >= 4) {
> return 4;
> }
> if (l >= 2) {
> return 2;
> }
> return 1;
>
> Or even
>
> return l >= access_size_max ? access_size_max :
> l >= 8 ? 8 :
> l >= 4 ? 4 :
> l >= 2 ? 2 :
> 1;
>
> But this is just bikeshedding, so I'm not suggesting it.
>
> Regarding function... I can at least understand this code. So, you want
> to find the most significant bit set in "l", and clear everything else.
> If said leftmost bit is to the left of bit#3, then use bit#3 instead.
>
> This idea should work if "l" is already a whole power of two.
>
> if (l >= access_size_max) {
> return access_size_max;
> }
> return 1 << max(3, lmb(l));
>
> What Paolo posted seems almost identical.
>
> clz32(l): leading zeros in "l"
> qemu_fls(l) == 32 - clz32(l): position of leftmost bit set, 1-based
> qemu_fls(l) - 1: position of leftmost bit set, 0-based
>
> Not sure if the (l & (l - 1)) check is needed in Paolo's patch. clz32()
> is not generally usable when l==0, so maybe that's (too) what the check
> is for. OTOH maybe l==0 is not even possible when entering
> memory_access_size().
>
> Second, Paolo's patch might lack the "max(3, ...)" part. Since you
> didn't call my previous example with l==9 retarded, I guess clamping
> (qemu_fls(l) - 1) at 3 would be necessary.
Whether we need to clamp on 3 really depends on the caller. I'm
actually doubtful that this function ever gets called with l > 8. So I
think Paolo's code works ok. It's possible your example of l == 9 was a
red herring for my code, but I didn't have enough faith in it anyway.
> Third, clz32() is probably very fast when gcc has a builtin for it, and
> probably slower than your open-coded version otherwsie.
Nope, the open coded version in v4 is significantly faster. See the
attached test programs. On my laptop I get these results (compiled with
-O):
$ time ./test-open
real 0m7.442s
user 0m7.412s
sys 0m0.005s
$ time ./test-fls
real 0m9.202s
user 0m9.117s
sys 0m0.024s
$ time ./test-pow2floor
real 0m13.884s
user 0m13.796s
sys 0m0.013s
At higher optimization levels the race gets a lot closer, but the open
coded version still seems to have an advantage (assuming the test code
even remains relevant at higher levels). So, I conclude that it's
faster to open code for the very limited range of a power-of-2 function
we need here.
> I still don't know enough about this topic, but I like this patch
> because I can understand the intent at least :)
>
> Reviewed-by: Laszlo Ersek <address@hidden>
Thanks!
Alex
test-open.c
Description: Text Data
test-fls.c
Description: Text Data
test-pow2floor.c
Description: Text Data