lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52D7E343.40909@linaro.org>
Date:	Thu, 16 Jan 2014 14:48:51 +0100
From:	Daniel Lezcano <daniel.lezcano@...aro.org>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	raistlin@...ux.it, juri.lelli@...il.com,
	Ingo Molnar <mingo@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/15/2014 01:04 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>>
>> Hi all,
>>
>> I use the tip/sched/core branch.
>>
>> After git pulling yesterday, my host is unresponsive after booting the OS.
>>
>>   * It boots normally
>>   * It sends info to the console
>>   * The graphics does not work
>>   * The terminals show the prompt, I can enter the username but after
>> pressing enter, it does not give the password prompt
>>   * sysrq works more or less, I can't get the process stack but it receives
>> the command
>>
>> It is like no new process can be created.
>>
>> I have a dual Xeon processor E5325 (2 x 4 cores).
>>
>> After git bisecting, the following patch seems to introduce the bug.
>>
>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>
> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
> if graphics works, but I can ssh in and work on it without bother.
>
> I can even log in on the serial console without problems.
>
> I tried both tip/master and tip/sched/core.
>
> Would you happen to have a .config for me to try?

I was able to reduce the scope and reproduce the issue.

AFAICT, that happens with rsyslogd. When login in a tty, the login 
command sends a message through /dev/log. But rsyslogd is never woken up 
and blocked in poll_schedule_timeout. The login process is blocked in 
unix_wait_for_peer.

I can strace rsyslogd at startup. The two last sched_setscheduler calls 
fail.

 > grep sched trace.out

3570  sched_getparam(3570, { 0 })       = 0
3570  sched_getscheduler(3570)          = 0 (SCHED_OTHER)
3570  sched_get_priority_min(SCHED_OTHER) = 0
3570  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_get_priority_min(SCHED_OTHER) = 0
3571  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_get_priority_min(SCHED_OTHER) = 0
3571  sched_get_priority_max(SCHED_OTHER) = 0
3571  sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = 0
3571  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_min resumed> ) = 0
3571  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_max resumed> ) = 0
3571  sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = -1 EPERM (Operation not 
permitted)
3571  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_min resumed> ) = 0
3571  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571  <... sched_get_priority_max resumed> ) = 0
3571  sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
3571  <... sched_setscheduler resumed> ) = -1 EPERM (Operation not 
permitted)

The same strace but on a kernel which does not hang. The calls to 
sched_setscheduler do not fail.

3292  sched_getparam(3292, { 0 })       = 0
3292  sched_getscheduler(3292)          = 0 (SCHED_OTHER)
3292  sched_get_priority_min(SCHED_OTHER) = 0
3292  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_get_priority_min(SCHED_OTHER) = 0
3293  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_get_priority_min(SCHED_OTHER) = 0
3293  sched_get_priority_max(SCHED_OTHER) = 0
3293  sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0
3293  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_min resumed> ) = 0
3293  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_max resumed> ) = 0
3293  sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0
3293  sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_min resumed> ) = 0
3293  sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293  <... sched_get_priority_max resumed> ) = 0
3293  sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
3293  <... sched_setscheduler resumed> ) = 0

The EPERM error comes from kernel/sched/core.c:3303

...
		if (fair_policy(policy)) {
			if (!can_nice(p, attr->sched_nice))
				return -EPERM;
		}
...


But I don't know why this is leading to block a process or making 
rsyslogd being not woken up by a packet coming in the af_unix socket.

I hope that helps

   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ