lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 17 Jul 2014 20:43:16 +0200
From:	Dietmar Eggemann <dietmar.eggemann@....com>
To:	Bruno Wolff III <bruno@...ff.to>
CC:	Josh Boyer <jwboyer@...hat.com>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

On 17/07/14 18:36, Bruno Wolff III wrote:
> I did a few quick boots this morning while taking a bunch of pictures. I have
> gone through some of them this morning and found one that shows bug on
> was triggered at 5850 which is from:
> BUG_ON(!cpumask_empty(sched_group_cpus(sg)));
>
> You can see the JPEG at:
> https://bugzilla.kernel.org/attachment.cgi?id=143331
>

Many thanks for testing this, Bruno!

So the memory of the cpumask of some sched_group(s) in your system has 
been altered between __visit_domain_allocation_hell()->__sdt_alloc() and 
build_sched_groups().

In the meantime, PeterZ has posted a patch which barfs when this happens 
but also prints out the sched groups with the related cpus but also 
includes the cpumask_clear so your machine would boot still fine.

If you could apply the patch:

https://lkml.org/lkml/2014/7/17/288

and then run it on your machine, that would give us more details, i.e. 
the information on which sched_group(s) and in which sched domain level 
(SMT and/or DIE) this issue occurs.


Another thing which you could do is to boot with an extra 
'earlyprintk=keep sched_debug' in your command line options with a build 
containing the cpumask_clear() in build_sched_groups() and extract the 
dmesg output of the scheduler-setup code:

Example:

[    0.119737] CPU0 attaching sched-domain:
[    0.119740]  domain 0: span 0-1 level SIBLING
[    0.119742]   groups: 0 (cpu_power = 588) 1 (cpu_power = 588)
[    0.119745]   domain 1: span 0-3 level MC
[    0.119747]    groups: 0-1 (cpu_power = 1176) 2-3 (cpu_power = 1176)
[    0.119751] CPU1 attaching sched-domain:
[    0.119752]  domain 0: span 0-1 level SIBLING
[    0.119753]   groups: 1 (cpu_power = 588) 0 (cpu_power = 588)
[    0.119756]   domain 1: span 0-3 level MC
[    0.119757]    groups: 0-1 (cpu_power = 1176) 2-3 (cpu_power = 1176)
[    0.119759] CPU2 attaching sched-domain:
[    0.119760]  domain 0: span 2-3 level SIBLING
[    0.119761]   groups: 2 (cpu_power = 588) 3 (cpu_power = 588)
[    0.119764]   domain 1: span 0-3 level MC
[    0.119765]    groups: 2-3 (cpu_power = 1176) 0-1 (cpu_power = 1176)
[    0.119767] CPU3 attaching sched-domain:
[    0.119768]  domain 0: span 2-3 level SIBLING
[    0.119769]   groups: 3 (cpu_power = 588) 2 (cpu_power = 588)
[    0.119772]   domain 1: span 0-3 level MC
[    0.119773]    groups: 2-3 (cpu_power = 1176) 0-1 (cpu_power = 1176)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ