lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080308015045.GB15909@linux-os.sc.intel.com>
Date:	Fri, 7 Mar 2008 17:50:45 -0800
From:	Suresh Siddha <suresh.b.siddha@...el.com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	dmitry.adamushko@...il.com, ego@...ibm.com, mingo@...e.hu,
	oleg@...sign.ru, yi.y.yang@...el.com, linux-kernel@...r.kernel.org,
	tglx@...utronix.de
Subject: Re: [BUG 2.6.25-rc3] scheduler/hotplug: some processes are dealocked when cpu is set to offline

On Sat, Mar 08, 2008 at 12:43:15AM +0100, Rafael J. Wysocki wrote:
> On Saturday, 8 of March 2008, Andrew Morton wrote:
> > On Fri, 7 Mar 2008 15:01:26 -0800
> > Suresh Siddha <suresh.b.siddha@...el.com> wrote:
> > 
> > > 
> > > Andrew, Please check if the appended patch fixes your power-off problem aswell.
> > > ...
> > >
> > > --- a/kernel/sched.c
> > > +++ b/kernel/sched.c
> > > @@ -5882,6 +5882,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
> > >  		break;
> > >  
> > >  	case CPU_DOWN_PREPARE:
> > > +	case CPU_DOWN_PREPARE_FROZEN:
> > >  		/* Update our root-domain */
> > >  		rq = cpu_rq(cpu);
> > >  		spin_lock_irqsave(&rq->lock, flags);
> > 
> > No, it does not.
> 
> Well, this is a bug nevertheless.
> 

Well, my previous root cause needs some small changes.

During the notifier call chain for CPU_DOWN(till 'update_sched_domains'
is called atleast), all the cpu's are attached to 'def_root_domain', for whom
online mask still has the offline cpu.

This is because, during CPU_DOWN_PREPARE, migration_call() first clears
the root_domain->online, and later during the DOWN_PREPARE call chain
detach_destroy_domains() attach to def_root_domain with cpu_online_map(which
still has the just about to die 'cpu' set).

So essentially, during the notifier call chain of CPU_DOWN (before
'update_sched_domains' is called atleast), any one doing RT process
wakeup's (for example: kthread_stop()) can still end up on the dead cpu.

Andrew, Can you please try one more patch(appended) to see if it helps?

Signed-off-by: Suresh Siddha <suresh.b.siddha@...el.com>
---

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 0a6d2e5..8cb707c 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -597,7 +597,7 @@ static int find_lowest_cpus(struct task_struct *task, cpumask_t *lowest_mask)
 	int       count       = 0;
 	int       cpu;
 
-	cpus_and(*lowest_mask, task_rq(task)->rd->online, task->cpus_allowed);
+	cpus_and(*lowest_mask, task->cpus_allowed, cpu_online_map);
 
 	/*
 	 * Scan each rq for the lowest prio.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ