linux-kernel - Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1306442327.2497.108.camel@laptop>
Date:	Thu, 26 May 2011 22:38:47 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()

On Wed, 2011-04-27 at 19:32 +0900, KOSAKI Motohiro wrote:
> 
> I've made concept proof patch today. The result is better than I expected.
> 
> <before>
>  Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
> 
>          1603777813  cache-references         #     56.987 M/sec   ( +-   1.824% )  (scaled from 25.36%)
>            13780381  cache-misses             #      0.490 M/sec   ( +-   1.360% )  (scaled from 25.55%)
>         24872032348  L1-dcache-loads          #    883.770 M/sec   ( +-   0.666% )  (scaled from 25.51%)
>           640394580  L1-dcache-load-misses    #     22.755 M/sec   ( +-   0.796% )  (scaled from 25.47%)
> 
>        14.162411769  seconds time elapsed   ( +-   0.675% )
> 
> <after>
>  Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
> 
>          1416147603  cache-references         #     51.566 M/sec   ( +-   4.407% )  (scaled from 25.40%)
>            10920284  cache-misses             #      0.398 M/sec   ( +-   5.454% )  (scaled from 25.56%)
>         24666962632  L1-dcache-loads          #    898.196 M/sec   ( +-   1.747% )  (scaled from 25.54%)
>           598640329  L1-dcache-load-misses    #     21.798 M/sec   ( +-   2.504% )  (scaled from 25.50%)
> 
>        13.812193312  seconds time elapsed   ( +-   1.696% )
> 
>  * datail data is in result.txt
> 
> 
> The trick is,
>  - Typical linux userland applications don't use mempolicy and/or cpusets
>    API at all.
>  - Then, 99.99% thread's  tsk->cpus_alloed have cpu_all_mask.
>  - cpu_all_mask case, every thread can share the same bitmap. It may help to
>    reduce L1 cache miss in scheduler.
> 
> What do you think? 

Nice!

If you finish the first patch (sort the TODOs) I'll take it.

I'm unsure about the PF_THREAD_UNBOUND thing though, then again, the
alternative is adding another struct cpumask * and have that point to
the shared mask or the private mask.

But yeah, looks quite feasible.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/