[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090527200419.GA1655@redhat.com>
Date: Wed, 27 May 2009 22:04:19 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Andi Kleen <andi@...stfloor.org>
Cc: paul@...-scientist.net, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Roland McGrath <roland@...hat.com>
Subject: Re: [2.6.27.24] Kernel coredump to a pipe is failing
On 05/27, Oleg Nesterov wrote:
>
> On 05/26, Andi Kleen wrote:
> >
> > When a signal happens during core dump the core dump to a pipe
> > can fail, because the write returns short, but the ELF core dumpers
> > cannot handle that.
> >
> > There's no reason to handle signals during core dumping, so just
> > block them all.
>
> Actually, I think there is a strong reason to handle signals during
> core dumping. The coredump can take a lot of time/resources, not good
> it looks like unkillable procees to users.
>
> Please look at
>
> killable/interruptible coredumps
> http://marc.info/?l=linux-kernel&m=121665710711931
>
> at least, I think SIGKILL should terminate core dumping.
Forgot to mention, and we have problems with OOM. Not only the coredumping
task can't be killed (and it can populate the memory via get_user_pages).
The coredump just disables OOM, if select_bad_process() sees the PF_EXITING
task with ->mm == NULL it returns -1.
> This all needs more discussion, but imho for now something like
> Paul's patch http://marc.info/?l=linux-kernel&m=124340506200729
> is the best workaround. Note that we have the same dump_write()
> in binfmt_elf.c and binfmt_aout.c, perhaps it makes sense to
> create coredump_file_write() helper in fs/exec.c.
But I didn't notice Paul also reports the kernel panic:
page:ffffe20010d63d00 flags:0x8000000000000001 mapping:0000000000000000 mapcount:0 \
count:0 Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 3346, comm: worker Tainted: P 2.6.27.24-worker #4
Call Trace:
[<ffffffff80284fd4>] bad_page+0x74/0xc0
[<ffffffff80286168>] free_hot_cold_page+0x248/0x2f0
[<ffffffff802f4096>] free_wr_note_data+0x56/0x70
[<ffffffff802a95c6>] kfree+0x86/0x100
[<ffffffff802f4096>] free_wr_note_data+0x56/0x70
[<ffffffff802f0991>] elf_core_dump+0x611/0x1160
At first glance, this looks like a bug outside of coredump.c,
we are trying to free PG_locked page?
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists