[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47DAE8DB.4040606@free.fr>
Date: Fri, 14 Mar 2008 22:06:35 +0100
From: Laurent Riffard <laurent.riffard@...e.fr>
To: Oleg Nesterov <oleg@...sign.ru>
CC: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, roland@...hat.com, mingo@...e.hu,
xemul@...nvz.org
Subject: Re: 2.6.25-rc5-mm1: "consolechars" hangs on boot
Le 14.03.2008 06:26, Oleg Nesterov a écrit :
> On 03/13, Andrew Morton wrote:
>> On Thu, 13 Mar 2008 23:07:30 +0100
>> Laurent Riffard <laurent.riffard@...e.fr> wrote:
>>
>>> Le 11.03.2008 09:14, Andrew Morton a __crit :
>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc5/2.6.25-rc5-mm1/
>>>>
>>> With 2.6.25-rc5-mm1, my system (Ubuntu 7.10/Gutsy) reliably hangs on
>>> boot. Sysrq-T shows 12 "consolechars" processes stuck in do_exit call.
>>>
>>> The bisection said "Sucker is
>>> patches/signals-send_signal-factor-out-signal_group_exit-checks.patch"
>>>
>>> Actually, it's 2.6.25-rc5-mm1 + add-warn_on_secs-macro-fix-fix.patch,
>>> I guess this patch is innocent.
>
> Laurent, thanks a lot!
>
> What was the state of consolechars? Where exactly did it hang? do_exit+??
(hand-copied data)
=================
consolechars ? de8925bc 3432 2795 1
.
.
.
Call Trace:
do_exit+0x5dd/0x5e1
do_group_exit+0x5e/0x86
sys_exit_group+0xf/0x11
sysenter_past_esp+0x5f/0xa5
=================
On first line, last number is always "1" for each of the 12 consolechars.
The call trace is always the same for each of the 12 consolechars.
>> Actually I later dropped
>> signals-send_signal-factor-out-signal_group_exit-checks.patch at Oleg's
>> request.
>>
>> But I don't think we did that because it was known to be buggy, so perhaps
>> the same bug crept back in in another form..
>
> Yes, currently I suspect we have another bug.
>
> And. While doing this patch I forgot we should fix the bugs with init first!
> (will try to make the patch soon).
>
> Laurent, any chance you can try 2.6.25-rc5-mm1 + the patch below?
> Unlikely it can help, but would be great to be sure.
Yes it does help ! Thanks.
Despite a big ERR in dmesg, the system now runs fine.
[ 26.536458] ReiserFS: sda7: Using r5 hash to sort names
[ 26.780261] ERR!! init is killed by 10
[ 26.781486] ------------[ cut here ]------------
[ 26.781492] WARNING: at kernel/signal.c:724 complete_signal+0x163/0x1eb()
[ 26.781497] Modules linked in: nls_iso8859_1 nls_cp850 vfat fat reiserfs eeprom w83781d hwmon_vid ipv6 snd_ens1371 firewire_ohci firewire_core gameport crc_itu_t snd_ac97_codec 8250_pnp ac97_bus snd_pcm_oss snd_mixer_oss 8250 serial_core snd_pcm snd_seq_oss floppy snd_seq_midi snd_rawmidi rtc snd_seq_midi_event snd_seq snd_timer snd_seq_device pcspkr snd uhci_hcd sr_mod cdrom soundcore snd_page_alloc ohci1394 sg via686a ne2k_pci 8390 ieee1394 i2c_viapro usbcore ata_generic parport_pc parport via_agp agpgart evdev dm_snapshot reiser4 lzo_decompress lzo_compress sd_mod pata_via libata scsi_mod dm_mirror dm_log dm_mod
[ 26.781609] Pid: 2590, comm: sh Not tainted 2.6.25-rc5-mm1 #18
[ 26.781619] [<c01188bd>] warn_on_slowpath+0x41/0x6d
[ 26.781640] [<c0119200>] ? vprintk+0x289/0x3b6
[ 26.781650] [<c01cc3a8>] ? number+0x10d/0x1cd
[ 26.781671] [<c0158db9>] ? cache_free_debugcheck+0x1e1/0x1ec
[ 26.781699] [<c0119342>] ? printk+0x15/0x17
[ 26.781709] [<c0120fa9>] complete_signal+0x163/0x1eb
[ 26.781719] [<c01211d4>] send_signal+0x1a3/0x1cf
[ 26.781729] [<c0121216>] __group_send_sig_info+0xa/0xc
[ 26.781737] [<c01217cc>] group_send_sig_info+0x44/0x62
[ 26.781747] [<c0121de4>] kill_pid_info+0x33/0x47
[ 26.781757] [<c0122443>] sys_kill+0x73/0x145
[ 26.781767] [<c014c655>] ? handle_mm_fault+0x21d/0x4f6
[ 26.781791] [<c012af3c>] ? up_read+0x16/0x2a
[ 26.781803] [<c011214c>] ? do_page_fault+0x25a/0x4da
[ 26.781815] [<c0103906>] sysenter_past_esp+0x5f/0xa5
[ 26.781834] =======================
[ 26.781838] ---[ end trace c053f6e3c5b0fb23 ]---
[ 26.827206] Adding 1048568k swap on /dev/mapper/vglinux1-lvswap. Priority:-1 extents:1 across:1048568k
(full dmesg attached)
> Oleg.
>
> --- MM/kernel/signal.c~ 2008-03-14 08:08:07.000000000 +0300
> +++ MM/kernel/signal.c 2008-03-14 08:08:17.000000000 +0300
> @@ -719,6 +719,10 @@ static void complete_signal(int sig, str
> /*
> * This signal will be fatal to the whole group.
> */
> +if (is_global_init(p)) {
> + printk(KERN_CRIT "ERR!! init is killed by %d\n", sig);
> + WARN_ON_ONCE(1);
> +} else
> if (!sig_kernel_coredump(sig)) {
> /*
> * Start a group exit and wake everybody up.
>
View attachment "dmesg-2.6.25-rc5-mm1" of type "text/plain" (32181 bytes)
Powered by blists - more mailing lists