linux-kernel - Re: [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1310649379.2586.273.camel@twins>
Date:	Thu, 14 Jul 2011 15:16:19 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Anton Blanchard <anton@...ba.org>
Cc:	mahesh@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	linuxppc-dev@...ts.ozlabs.org, mingo@...e.hu,
	benh@...nel.crashing.org, torvalds@...ux-foundation.org
Subject: Re: [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982

On Thu, 2011-07-14 at 14:35 +1000, Anton Blanchard wrote:

> I also printed out the cpu spans as we walk through build_sched_groups:

> 0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480

> Duplicates start appearing in this span:
> 128 160 192 224 256 288 320 352 384 416 448 480 512 544 576 608
> 
> So it looks like the overlap of the 16 entry spans
> (SD_NODES_PER_DOMAIN) is causing our problem.

Urgh.. so those spans are generated by sched_domain_node_span(), and it
looks like that simply picks the 15 nearest nodes to the one we've got
without consideration for overlap with previously generated spans.

Now that used to work because it used to simply allocate a new group
instead of using the existing one.

The thing is, we want to track state unique to a group of cpus, so
duplicating that is iffy.

Otoh, making these masks non-overlapping is probably sub-optimal from a
NUMA pov.

Looking at a slightly simpler set-up (4 socket AMD magny-cours):

$ cat /sys/devices/system/node/node*/distance
10 16 16 22 16 22 16 22
16 10 22 16 22 16 22 16
16 22 10 16 16 22 16 22
22 16 16 10 22 16 22 16
16 22 16 22 10 16 16 22
22 16 22 16 16 10 22 16
16 22 16 22 16 22 10 16
22 16 22 16 22 16 16 10

We can translate that into groups like

{0} {0,1,2,4,6} {0-7}
{1} {1,0,3,5,7} {0-7}
...

and we can easily see there's overlap there as well in the NUMA layout
itself.

This seems to suggest we need to separate the unique state from the
sched_group.

Now all I need is a way to not consume gobs of memory.. /me goes prod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/