lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080315120324.GA76@tv-sign.ru>
Date:	Sat, 15 Mar 2008 15:03:24 +0300
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Laurent Riffard <laurent.riffard@...e.fr>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, roland@...hat.com, mingo@...e.hu,
	xemul@...nvz.org
Subject: Re: 2.6.25-rc5-mm1: "consolechars" hangs on boot

On 03/14, Laurent Riffard wrote:
>
> >>>With 2.6.25-rc5-mm1, my system (Ubuntu 7.10/Gutsy) reliably hangs on
> >>>boot. Sysrq-T shows 12 "consolechars" processes stuck in do_exit call.
> >>>
> >>>The bisection said "Sucker is
> >>>patches/signals-send_signal-factor-out-signal_group_exit-checks.patch"
> >>>
> consolechars ? de8925bc  3432 2795 1
> .
> .
> .
> Call Trace:
> 	do_exit+0x5dd/0x5e1
> 	do_group_exit+0x5e/0x86
> 	sys_exit_group+0xf/0x11
> 	sysenter_past_esp+0x5f/0xa5

Aha, the task doesn't hang, it has exited (as expected), please see below...

> >And. While doing this patch I forgot we should fix the bugs with init 
> >first!
> >(will try to make the patch soon).
> >
> >Laurent, any chance you can try 2.6.25-rc5-mm1 + the patch below?
> >Unlikely it can help, but would be great to be sure.
> 
> Yes it does help ! Thanks.
> 
> Despite a big ERR in dmesg, the system now runs fine.
> 
> [   26.780261] ERR!! init is killed by 10

Great. Thanks a lot Laurent!

So what happens is:

We have the very old bug (bugs, actually) with the global init && signals
which I tried to fix many times but can't find a simple solution. The fatal
signal sent to init doesn't really kill it (we have the check in
get_signal_to_deliver) but it sets SIGNAL_GROUP_EXIT. This is wrong, now
init can't exec, this has other bad implications, and this is just insane.

With the signals-send_signal-factor-out-signal_group_exit-checks.patch the
task with SIGNAL_GROUP_EXIT doesn't recieve the signals. While this change
itself is (I hope) correct, the "killed" /sbin/init now can't see SIGCHLD
and the system hangs on boot.

> [   26.781709]  [<c0120fa9>] complete_signal+0x163/0x1eb
> [   26.781719]  [<c01211d4>] send_signal+0x1a3/0x1cf
> [   26.781729]  [<c0121216>] __group_send_sig_info+0xa/0xc
> [   26.781737]  [<c01217cc>] group_send_sig_info+0x44/0x62
> [   26.781747]  [<c0121de4>] kill_pid_info+0x33/0x47
> [   26.781757]  [<c0122443>] sys_kill+0x73/0x145
> [   26.781767]  [<c014c655>] ? handle_mm_fault+0x21d/0x4f6
> [   26.781791]  [<c012af3c>] ? up_read+0x16/0x2a
> [   26.781803]  [<c011214c>] ? do_page_fault+0x25a/0x4da
> [   26.781815]  [<c0103906>] sysenter_past_esp+0x5f/0xa5

Not a kernel problem, but this looks a bit strange to me.

init has SIG_DFL for SIGUSR1, and someone does kill(1, SIGUSR1).
Note that init was explicitly targeted, the signal was not sent
to prgp or -1.

Most likely Ubuntu knows what it does, and I can't find any email
at ubuntu.com to cc...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ