lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070807131216.GA20424@in.ibm.com>
Date:	Tue, 7 Aug 2007 18:42:16 +0530
From:	Gautham R Shenoy <ego@...ibm.com>
To:	Ingo Molnar <mingo@...e.hu>, Oleg Nesterov <oleg@...sign.ru>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>
Cc:	linux-kernel@...r.kernel.org, Dipankar Sarma <dipankar@...ibm.com>,
	Paul E McKenney <paulmck@...ibm.com>
Subject: Cpu-Hotplug and Real-Time

Hi, 

While running a cpu-hotplug test involving a high priority
process (SCHED_RR, prio=94) trying to periodically offline and
online cpu1 on a 2-processor machine, I noticed that the system was
becoming unresponsive after a few iterations.

However, when the same test was repeated with processors
greater than 2, it worked fine. 
Also, if the hotplugging process, was not of rt-prio, it
worked fine on a 2-processor machine.

After some debugging, I saw that the hang occured because
the high prio process was stuck in a loop doing yield() inside
wait_task_inactive(). Description follows:

Say a high-prio task (A) does a kthread_create(B),
followed by a kthread_bind(B, cpu1). At this moment, 
only cpu0 is online.

Now, immediately after being created, B would
do a 
complete(&create->started) [kernel/kthread.c: kthread()], 
before scheduling itself out.

This complete() will wake up kthreadd, which had spawned B.
It is possible that during the wakeup, kthreadd might preempt B.
Thus, B is still on the runqueue, and not yet called schedule().

kthreadd, will inturn do a 
complete(&create->done); [kernel/kthread.c: create_kthread()]
which will wake up the thread which had called kthread_create().
In our case it's task A, which will run immediately, since its priority
is higher.

A will now call kthread_bind(B, cpu1).
kthread_bind(), calls wait_task_inactive(B), to ensures that 
B has scheduled itself out.

B is still on the runqueue, so A calls yield() in wait_task_inactive().
But since A is the task with the highest prio, scheduler schedules it
back again.

Thus B never gets to run to schedule itself out.
A loops waiting for B to schedule out leading  to system hang.

In my case,
A was the high priority process trying to bring up cpu1, and
thus doing a kthread_create/kthread_bind in 
migration_call(): CPU_UP_PREPARE.

B was the migration thread for cpu1.

And the above problem occurs when only one cpu is online.

Possible solutions to this problem:
a) Let the newly spawned kernel threads inherit
   their parent's prio and policy. 

b) Instead of using yield() in wait_task_inactive(), we could use
   something like a yield_to(p):

    yield_to(struct task_struct p)
    {
    	int old_prio = p->prio;
	/* Temporarily boost p's priority atleast to that of current task */
    	if (current->prio > old_prio)
		set_prio(p, current->prio);
	yield();
	/* Reset priority back to the original value */
	set_prio(p, old_prio);
    }


Thoughts?

Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ