lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 18 Jul 2014 11:28:14 +0200
From:	Dietmar Eggemann <dietmar.eggemann@....com>
To:	Bruno Wolff III <bruno@...ff.to>,
	Peter Zijlstra <peterz@...radead.org>
CC:	Josh Boyer <jwboyer@...hat.com>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

On 18/07/14 07:34, Bruno Wolff III wrote:
> On Thu, Jul 17, 2014 at 14:35:02 +0200,
>    Peter Zijlstra <peterz@...radead.org> wrote:
>>
>> In any case, can someone who can trigger this run with the below; its
>> 'clean' for me, but supposedly you'll trigger a FAIL somewhere.
>
> I got a couple of fail messages.
>
> dmesg output is available in the bug as the following attachment:
> https://bugzilla.kernel.org/attachment.cgi?id=143361
>
> The part of interest is probably:
>
> [    0.253354] build_sched_groups: got group f255b020 with cpus:
> [    0.253436] build_sched_groups: got group f255b120 with cpus:
> [    0.253519] build_sched_groups: got group f255b1a0 with cpus:
> [    0.253600] build_sched_groups: got group f255b2a0 with cpus:
> [    0.253681] build_sched_groups: got group f255b2e0 with cpus:
> [    0.253762] build_sched_groups: got group f255b320 with cpus:
> [    0.253843] build_sched_groups: got group f255b360 with cpus:
> [    0.254004] build_sched_groups: got group f255b0e0 with cpus:
> [    0.254087] build_sched_groups: got group f255b160 with cpus:
> [    0.254170] build_sched_groups: got group f255b1e0 with cpus:
> [    0.254252] build_sched_groups: FAIL
> [    0.254331] build_sched_groups: got group f255b1a0 with cpus: 0
> [    0.255004] build_sched_groups: FAIL
> [    0.255084] build_sched_groups: got group f255b1e0 with cpus: 1

That (partly) explains it. f255b1a0 (5) and f255b1e0 (6) are reused 
here! This reuse doesn't happen on my machines.

But if they are used for a different cpu mask (not including cpu0 resp. 
cpu1 this would mess up their first usage?

I guess that the second time, cpu3 will be added to the cpumask of 
f255b1a0 and cpu4 to f255b1e0?

Maybe we can extend PeterZ patch to print out cpu and span as well us 
this printk also in free_sched_domain() to debug further if this is not 
enough evidence?

[    0.252059] __sdt_alloc: allocated f255b020 with cpus: (1)
[    0.252147] __sdt_alloc: allocated f255b0e0 with cpus: (2)
[    0.252229] __sdt_alloc: allocated f255b120 with cpus: (3)
[    0.252311] __sdt_alloc: allocated f255b160 with cpus: (4)
[    0.252395] __sdt_alloc: allocated f255b1a0 with cpus: (5)
[    0.252477] __sdt_alloc: allocated f255b1e0 with cpus: (6)
[    0.252559] __sdt_alloc: allocated f255b220 with cpus: (7) (not used)
[    0.252641] __sdt_alloc: allocated f255b260 with cpus: (8) (not used)
[    0.253013] __sdt_alloc: allocated f255b2a0 with cpus: (9)
[    0.253097] __sdt_alloc: allocated f255b2e0 with cpus: (10)
[    0.253184] __sdt_alloc: allocated f255b320 with cpus: (11)
[    0.253265] __sdt_alloc: allocated f255b360 with cpus: (12)

[    0.253354] build_sched_groups: got group f255b020 with cpus: (1)
[    0.253436] build_sched_groups: got group f255b120 with cpus: (3)
[    0.253519] build_sched_groups: got group f255b1a0 with cpus: (5)
[    0.253600] build_sched_groups: got group f255b2a0 with cpus: (9)
[    0.253681] build_sched_groups: got group f255b2e0 with cpus: (10)
[    0.253762] build_sched_groups: got group f255b320 with cpus: (11)
[    0.253843] build_sched_groups: got group f255b360 with cpus: (12)
[    0.254004] build_sched_groups: got group f255b0e0 with cpus: (2)
[    0.254087] build_sched_groups: got group f255b160 with cpus: (4)
[    0.254170] build_sched_groups: got group f255b1e0 with cpus: (6)
[    0.254252] build_sched_groups: FAIL
[    0.254331] build_sched_groups: got group f255b1a0 with cpus: 0 (5)
[    0.255004] build_sched_groups: FAIL
[    0.255084] build_sched_groups: got group f255b1e0 with cpus: 1 (6)
[    0.255365] devtmpfs: initialized

>
> I also booted with early printk=keepsched_debug as requested by
> Dietmar.
>

Didn't see what I was looking for in your dmesg output. Did you use
'earlyprintk=keep sched_debug'







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ