lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47DAE8DB.4040606@free.fr>
Date:	Fri, 14 Mar 2008 22:06:35 +0100
From:	Laurent Riffard <laurent.riffard@...e.fr>
To:	Oleg Nesterov <oleg@...sign.ru>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, roland@...hat.com, mingo@...e.hu,
	xemul@...nvz.org
Subject: Re: 2.6.25-rc5-mm1: "consolechars" hangs on boot

Le 14.03.2008 06:26, Oleg Nesterov a écrit :
> On 03/13, Andrew Morton wrote:
>> On Thu, 13 Mar 2008 23:07:30 +0100
>> Laurent Riffard <laurent.riffard@...e.fr> wrote:
>>
>>> Le 11.03.2008 09:14, Andrew Morton a __crit :
>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/
>>>>
>>> With 2.6.25-rc5-mm1, my system (Ubuntu 7.10/Gutsy) reliably hangs on
>>> boot. Sysrq-T shows 12 "consolechars" processes stuck in do_exit call.
>>>
>>> The bisection said "Sucker is
>>> patches/signals-send_signal-factor-out-signal_group_exit-checks.patch"
>>>
>>> Actually, it's 2.6.25-rc5-mm1 + add-warn_on_secs-macro-fix-fix.patch,
>>> I guess this patch is innocent.
> 
> Laurent, thanks a lot!
> 
> What was the state of consolechars? Where exactly did it hang? do_exit+??

(hand-copied data)
=================
consolechars ? de8925bc  3432 2795 1
 .
 .
 .
Call Trace:
	do_exit+0x5dd/0x5e1
	do_group_exit+0x5e/0x86
	sys_exit_group+0xf/0x11
	sysenter_past_esp+0x5f/0xa5
=================

On first line, last number is always "1" for each of the 12 consolechars.
The call trace is always the same for each of the 12 consolechars.

>> Actually I later dropped
>> signals-send_signal-factor-out-signal_group_exit-checks.patch at Oleg's
>> request.
>>
>> But I don't think we did that because it was known to be buggy, so perhaps
>> the same bug crept back in in another form..
> 
> Yes, currently I suspect we have another bug.
> 
> And. While doing this patch I forgot we should fix the bugs with init first!
> (will try to make the patch soon).
> 
> Laurent, any chance you can try 2.6.25-rc5-mm1 + the patch below?
> Unlikely it can help, but would be great to be sure.

Yes it does help ! Thanks.

Despite a big ERR in dmesg, the system now runs fine.

[   26.536458] ReiserFS: sda7: Using r5 hash to sort names
[   26.780261] ERR!! init is killed by 10
[   26.781486] ------------[ cut here ]------------
[   26.781492] WARNING: at kernel/signal.c:724 complete_signal+0x163/0x1eb()
[   26.781497] Modules linked in: nls_iso8859_1 nls_cp850 vfat fat reiserfs eeprom w83781d hwmon_vid ipv6 snd_ens1371 firewire_ohci firewire_core gameport crc_itu_t snd_ac97_codec 8250_pnp ac97_bus snd_pcm_oss snd_mixer_oss 8250 serial_core snd_pcm snd_seq_oss floppy snd_seq_midi snd_rawmidi rtc snd_seq_midi_event snd_seq snd_timer snd_seq_device pcspkr snd uhci_hcd sr_mod cdrom soundcore snd_page_alloc ohci1394 sg via686a ne2k_pci 8390 ieee1394 i2c_viapro usbcore ata_generic parport_pc parport via_agp agpgart evdev dm_snapshot reiser4 lzo_decompress lzo_compress sd_mod pata_via libata scsi_mod dm_mirror dm_log dm_mod
[   26.781609] Pid: 2590, comm: sh Not tainted 2.6.25-rc5-mm1 #18
[   26.781619]  [<c01188bd>] warn_on_slowpath+0x41/0x6d
[   26.781640]  [<c0119200>] ? vprintk+0x289/0x3b6
[   26.781650]  [<c01cc3a8>] ? number+0x10d/0x1cd
[   26.781671]  [<c0158db9>] ? cache_free_debugcheck+0x1e1/0x1ec
[   26.781699]  [<c0119342>] ? printk+0x15/0x17
[   26.781709]  [<c0120fa9>] complete_signal+0x163/0x1eb
[   26.781719]  [<c01211d4>] send_signal+0x1a3/0x1cf
[   26.781729]  [<c0121216>] __group_send_sig_info+0xa/0xc
[   26.781737]  [<c01217cc>] group_send_sig_info+0x44/0x62
[   26.781747]  [<c0121de4>] kill_pid_info+0x33/0x47
[   26.781757]  [<c0122443>] sys_kill+0x73/0x145
[   26.781767]  [<c014c655>] ? handle_mm_fault+0x21d/0x4f6
[   26.781791]  [<c012af3c>] ? up_read+0x16/0x2a
[   26.781803]  [<c011214c>] ? do_page_fault+0x25a/0x4da
[   26.781815]  [<c0103906>] sysenter_past_esp+0x5f/0xa5
[   26.781834]  =======================
[   26.781838] ---[ end trace c053f6e3c5b0fb23 ]---
[   26.827206] Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 extents:1 across:1048568k

(full dmesg attached)

> Oleg.
> 
> --- MM/kernel/signal.c~	2008-03-14 08:08:07.000000000 +0300
> +++ MM/kernel/signal.c	2008-03-14 08:08:17.000000000 +0300
> @@ -719,6 +719,10 @@ static void complete_signal(int sig, str
>  		/*
>  		 * This signal will be fatal to the whole group.
>  		 */
> +if (is_global_init(p)) {
> +	printk(KERN_CRIT "ERR!! init is killed by %d\n", sig);
> +	WARN_ON_ONCE(1);
> +} else
>  		if (!sig_kernel_coredump(sig)) {
>  			/*
>  			 * Start a group exit and wake everybody up.
> 

View attachment "dmesg-2.6.25-rc5-mm1" of type "text/plain" (32181 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ