lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5126FC92.9010606@linux.vnet.ibm.com>
Date:	Fri, 22 Feb 2013 13:05:22 +0800
From:	Michael Wang <wangyun@...ux.vnet.ibm.com>
To:	Alex Shi <alex.shi@...el.com>
CC:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
	Mike Galbraith <efault@....de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ram Pai <linuxram@...ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	Namhyung Kim <namhyung@...nel.org>
Subject: Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

On 02/22/2013 12:46 PM, Alex Shi wrote:
> On 02/22/2013 12:19 PM, Michael Wang wrote:
>>
>>>> Why not seek other way to change O(n^2) to O(n)?
>>>>
>>>> Access 2G memory is unbelievable performance cost.
>> Not access 2G memory, but (2G / 16K) memory, the sbm size is O(N).
>>
>> And please notice that on 16k cpus system, topology will be deep if NUMA
>> enabled (O(log N) as Peter said), and that's really a good stage for
>> this idea to perform on, we could save lot's of recursed 'for' cycles.
>>
> 
> CPU execute part is very very fast compare to the memory access, the
> 'for' cycles cost is most on the memory access for many domain/groups
> data, not instruction execution.
> 
> In a hot patch, several KB memory access will cause clear cpu cache
> pollution then make kernel slowly.

Hmm...that's a good catch.

Comparison between memory access and cpu execution, no doubt the latter
will win, you are right.

But that was same in the old world when access the struct sched_domain,
isn't it?

                for_each_domain(cpu, tmp) {
                        if (weight <= tmp->span_weight)
                                break;
                        if (tmp->flags & sd_flag)
                                sd = tmp;
                }

Both old and new may access data across nodes, but the old one will
access several times more, isn't it?

Regards,
Michael Wang

> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ