Re: Octave 3.6.0 on Windows XP plot fails.

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Octave 3.6.0 on Windows XP plot fails.

From:	Przemek Klosowski
Subject:	Re: Octave 3.6.0 on Windows XP plot fails.
Date:	Wed, 29 Feb 2012 11:54:57 -0500
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20120131 Thunderbird/10.0

On 02/29/2012 10:01 AM, Michael Goffioul wrote:

On Wed, Feb 29, 2012 at 2:28 PM, Martin Helm<address@hidden>  wrote:

Am 29.02.2012 14:45, schrieb Xianyi Zhang:

The matrix multiplication cannot obtain the performance from
hyperthreading.

Why not? Is this a limitation of the mingw compiler, the windows
environment or the BLAS library in question?


No, I think it's because of the principle of hyperthreading. HT does
not mean you magically have 4 independent cores out of 2. You still
have only 2 physical cores, but some parts of each core are duplicated
such that they can appear as 4 instead of 2 at the OS level. However,
the processing unit is not duplicated: so within a single physical
core, each logical CPU will have to wait its turn on the processing
unit.

Hyperthreading aka SMT provides two sets of registers but only one ALUand memory interface unit for loading and storing data to the mainmemory. The main benefit of SMT is masking the memory latency: run thethread whose data is already loaded into CPU registers, while the otherthread is stalled waiting for the DRAM data.

Given that the register-based instructions run roughly at 1 clock perinstruction, and the memory latency (time for the load/store unit in theCPU to send the address to the interface, the virtual-physicaltranslation via the TLBs, cache lookups, and finally the DRAM access andthe data's trip back) is measured in tens of nanoseconds, if one threadis waiting on a main memory data request, the other thread can run 50 orso instructions if it has the data loaded up into registers. In the bestpossible world, by the time the second thread needs some DRAM data thefirst thread would have finished loading its data, and they might end upalternating, covering up each other's DRAM latency.

Unfortunately, matrix multiplication tends to be memory-intensive (loadtwo numbers, multiply, accumulate); there's not much opportunity forlong register-based calculations. It turns out that hyperthreading doesshow some limited benefit but the general recommendation is that it'snot very useful.

There's a silver lining though: caching and pre-loading optimalconsecutive chunks of arrays, and using vector operations such as SSEdoes work, which is what ATLAS and GotoBLAS are doing.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Octave 3.6.0 on Windows XP plot fails., (continued)
- Re: Octave 3.6.0 on Windows XP plot fails., Tatsuro MATSUOKA, 2012/02/15
  - Re: Octave 3.6.0 on Windows XP plot fails., Kurt M. Sanger, 2012/02/15

Prev by Date: Re: Octave 3.6.0 on Windows XP plot fails.
Next by Date: i+++j
Previous by thread: Re: Octave 3.6.0 on Windows XP plot fails.
Next by thread: Re: Octave 3.6.0 on Windows XP plot fails.
Index(es):
- Date
- Thread