linux-kernel - Re: scheduler crash on Power

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1407122432.2286.0.camel@concordia>
Date:	Mon, 04 Aug 2014 13:20:32 +1000
From:	Michael Ellerman <mpe@...erman.id.au>
To:	Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>
Cc:	Dietmar Eggemann <dietmar.eggemann@....com>,
	"bruno@...ff.to" <bruno@...ff.to>,
	Michael Ellerman <michaele@....ibm.com>,
	"jwboyer@...hat.com" <jwboyer@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"peterz@...rdead.org" <peterz@...rdead.org>,
	"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: scheduler crash on Power

On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
> Dietmar Eggemann [dietmar.eggemann@....com] wrote:
> | > ltcbrazos2-lp07 login: [  181.915974] ------------[ cut here ]------------
> | > [  181.915991] WARNING: at ../kernel/sched/core.c:5881
> | 
> | This warning indicates the problem. One of the struct sched_domains does
> | not have it's groups member set.
> | 
> | And its happening during a rebuild of the sched domain hierarchy, not
> | during the initial build.
> | 
> | You could run your system with the following patch-let (on top of
> | https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
> | patches (w/ CONFIG_SCHED_DEBUG enabled).
> | 
> | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
> | struct sched_domain *sd)
> |  {
> |         struct sched_group *sg = sd->groups;
> | 
> | +#ifdef CONFIG_SCHED_DEBUG
> | +       printk("sd name: %s span: %pc\n", sd->name, sd->span);
> | +#endif
> |         WARN_ON(!sg);
> | 
> |         do {
> | 
> | This will show if the rebuild of the sched domain hierarchy happens on
> | both systems and hopefully indicate for which sched_domain the
> | sd->groups is not set.
> 
> Thanks for the patch. It appears that the NUMA sched domain does not
> have the sd->groups set - snippet of the error (with your patch and
> Peter's patch)
> 
> [  181.914494] build_sched_groups: got group c000000006da0000 with cpus: 
> [  181.914498] build_sched_groups: got group c0000000dd830000 with cpus: 
> [  181.915234] sd name: SMT span: 8-15
> [  181.915239] sd name: DIE span: 0-7
> [  181.915242] sd name: NUMA span: 0-15
> [  181.915250] ------------[ cut here ]------------
> [  181.915253] WARNING: at ../kernel/sched/core.c:5891
> 
> Patched code:
> 
> 	5884 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> 	5885 {
> 	5886         struct sched_group *sg = sd->groups;
> 	5887 
> 	5888 #ifdef CONFIG_SCHED_DEBUG
> 	5889         printk("sd name: %s span: %pc\n", sd->name, sd->span);
> 	5890 #endif
> 	5891         WARN_ON(!sg);
> 
> Complete log below.
> 
> I was able to bisect it down to this patch in the 24x7 patchset
> 
> 	https://lkml.org/lkml/2014/5/27/804
> 
> I replaced the kfree(page) calls in the patch with
> kmem_cache_free(hv_page_cache, page).
> 
> The problem sems to disappear if the call to create_events_from_catalog()
> in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.

Is that patch just clobbering memory it doesn't own and corrupting the
scheduler data structures?

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/