openexr-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] OpenExr 2.3 - slower write speeds for Uncompressed a


From: Gyula Gubacsi
Subject: Re: [Openexr-devel] OpenExr 2.3 - slower write speeds for Uncompressed and Zip1
Date: Tue, 23 Apr 2019 10:55:36 +0100

Hi,

I believe I found the issue, but I still need to make the proper measurement. After a few measurements I found that the time this function spent most was two load operations, where it loads the read and write pointers back into a registers. This was very suspicious as the function should not use all that many registers, and shouldn't need to cycle them out back to their address. The function accepts references to the read and write pointers which is seemingly throwing the compiler off. By capturing them into local variable as just straight pointers, and writing them back before return seems to eliminate the problem. But I need to make more measurement before I submit a patch.

On Tue, 23 Apr 2019 at 10:26, Kevin Wheatley <address@hidden> wrote:
another data point.

When I first experimented with adding DWA to Nuke using OpenEXR 2.2.0 I had to patch the configure so I could enable f16c instructions for gcc 4.1.2, after doing so vtune pointed to the copyFromFrameBuffer function when going from half to float for ~30+% of the CPU when reading files from local SSD. (Aside, there were a number of other namespace related fixes that were needed too, all of these are in the latest OpenEXR versions). I came to the conclusion that to make the performance any better it would need a f16c based half to float conversion function rather than going via the LUT, at least for those CPUs supporting those instructions. I also have some notes about testing memory mapped reading, but no conclusions.

This was not the case when f16c were disabled as other functions appeared higher in the profile - the total performance was lower without f16c (no surprise), it was only because the other functions got reduced by the f16c that bubbled copyFromFrameBuffer to the top.

I didn't try RLE compression.
Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]