lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100127015018.42793269@werewolf.home>
Date:	Wed, 27 Jan 2010 01:50:18 +0100
From:	"J.A. Magallón" <jamagallon@....com>
To:	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Hyperthreading on Core i7s: To use or not to use?

On Tue, 26 Jan 2010 10:56:57 +0000, Daniel J Blueman <daniel.blueman@...il.com> wrote:

> On Jan 26, 10:10 am, Justin Piszcz <jpiszcz@...idpixels.com> wrote:
> > Hello,
> >
> > Should the 'correct' kernel [CPU] configuration for a core i7 860/870..?
> >
> > - Multi-core support
> > - Cores: 8
> > - SMT: Enabled/ON
> >
> > From CONFIG_SCHED_SMT:
> >
> >    . SMT scheduler support improves the CPU scheduler's decision making      .
> >    . when dealing with Intel Pentium 4 chips with HyperThreading at a        .
> >    . cost of slightly increased overhead in some places. If unsure say       .
> >    . N here.                                                                 .
> >
> > Does this also 'help' and/or 'apply' as much when dealing with Core i7s?
> >
> > --
> >
> > Quick little benchmark (pbzip2 -9 linux kernel source), the benchmark is
> > really within the noise (8 on/off)
> > - Multicore(8)/HT(Off) = 73.72user 0.33system 0:09.50elapsed 779%CPU (0avgtext+0avgdata 458528maxresident
> > - Multicore(8)/HT(On) = 74.28user 0.40system 0:09.67elapsed 772%CPU (0avgtext+0avgdata 428304maxresident
> > - Multicore(4)/HT(On) = 68.76user 0.30system 0:17.44elapsed 396%CPU (0avgtext+0avgdata 213616maxresident)k

What does 'multicore(n)' mean ? And HT ?
Are they BIOS, kernel or pbzip2 options ?

Well, with that i7 you have:
- 1 processor
- 4 cores
- 8 threads (4x2)

(BTW, i think the 'CPUs' nomenclature in kernel is misleading...
 with nowadays processor, what does 'CPU' mean ? processor, core, thread...)

With that processor, you should:
- Configure the kernel for 8 'kernel CPUs', what means you can support
  8 threads
- Configure SCHED_MC on, as you have several cores inside the same CPU that
  share L3 cache
- Configure SCHED_SMT on, as you have several threads per core

Don't know if SCHED_SMT is very useful for not-hand-crafted apps, but it can't
hurt too much for some extra scheduling calculations...

What does 'within the noise' mean for you ?
With the 4-on mode, it just takes 2x the time to do the work...(look at elapsed!)
So, assuming the kernel is 'intelligent':
- Using 4 threads, it takes 17.44, and scheduler is using 4 threads located
  on different cores.
- Using 8 threads, it takes  9.50. So effciency is 17.44/(2*9.50) = 91%
  Very good! So this HyperThreading is not like that old in P4s, works much
  better. Even with each couple threads competing for registers and L1 (or L2?)
  cache.

So hyper threading is good, why should you disable it ?

> > --
> >
> > Has anyone done any in-depth benchmarking for the core i7s that have multiple
> > cores and HT disabled/enabled?
> 
> With my Dell Studio 15 (model 1557) laptop, there is no option to
> disable HT in the current BIOS, so booting with maxcpus=4 (since the
> kernel enumerates non-sibling cores first) gave me a 5-15% speedup on
> some large image processing (convolution, FFTs, conversion) on all
> available cores, presumably due to better cache efficiency.
> 

Sure you don't have a full extra processor, but it can do some work.

An example. This is a ray tracing engine. If you don't know ray-tracing,
lets say in short it implies traversing a tree structure and doing some
floating point calculations on each node. This repeated millions of times.
With an old P4@2.8 GHz, with HT, this are some simple benchmarks:

- One thread can do about 540 kilo-rays per second
- Both threads do about 800 kR/s. Efficiency: 75%
  In other words, the 'second' thread counts as about 'half' more CPU,
  like if you had 1.5 cpus instead of 2.0. Anyways, its more than 1.0.

And this is not seriously hand-crafted, just POSIX threads code.

If your application is any more complex than the pathological case of
summing two vectors, you can use the HT-sibling for something useful.
And even in that case you can code your program in a cache-aware fashion,
perhaps doing an interleaved sum instead of one chunk on each thread (?)...

-- 
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2010.1 (Cooker) for x86_64
Linux 2.6.32.3-desktop-0.rc2.1mnb (gcc 4.4.2 ) SMP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ