lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1243788982.7369.370.camel@homebase.localnet>
Date:	Sun, 31 May 2009 12:56:22 -0400
From:	Paul Smith <paul@...-scientist.net>
To:	Olivier Galibert <galibert@...ox.com>
Cc:	Alan Cox <alan@...rguk.ukuu.org.uk>, linux-kernel@...r.kernel.org,
	stable@...nel.org, Andrew Morton <akpm@...ux-foundation.org>,
	Andi Kleen <andi@...stfloor.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Roland McGrath <roland@...hat.com>
Subject: Re: [PATCH] coredump: Retry writes where appropriate

On Sun, 2009-05-31 at 16:03 +0200, Olivier Galibert wrote:
> On Sun, May 31, 2009 at 11:18:51AM +0100, Alan Cox wrote:
> > On Sun, 31 May 2009 01:33:39 -0400
> > Paul Smith <paul@...-scientist.net> wrote:
> > 
> > > coredump: Retry writes where appropriate
> > > 
> > > Core dump write operations (especially to a pipe) can be incomplete due
> > > to signal reception or possibly recoverable partial writes.
> > 
> > NAK this
> > 
> > > Previously any incomplete write in the ELF core dumper caused the core
> > > dump to stop, giving short cores in these cases.  Modify the core dumper
> > > to retry the write where appropriate.
> > 
> > The existing behaviour is an absolute godsend when you've something like
> > a core dump stuck on an NFS mount or something trying to core dump to
> > very slow media.
> > 
> > In fact the signals checks were *purposefully added* some time ago.

This is what Olivier mentioned as well, and I do see the benefit in
being able to get rid of hung up coredumps.  But to me it's more
important to have reliable and robust coredumping, and I'm getting
reports of short cores on my systems at least once a week due to this
problem (the userspace applications I'm working with use signals for
certain well-defined situations, that tend to happen at around the same
time as you might expect core dumps).

> Perhaps removing the "|| r == -EINTR" part would make both of you
> happy?  He gets the reliability on pipes, you keep the interrupt on
> signals.

I'm getting back ERESTARTSYS in my environment, and it's happening
because pipe_write() detects a signal pending.  I don't think this is
due to SIGPIPE, and I'm not sure that removing EINTR will give Alan the
behavior he is looking for.

Another possibility would be to examine the signal itself and don't
retry if it's SIGKILL.  I'm too much of a kernel hacking noob to know
offhand how to find the pending signal but I can certainly figure it
out.  If it's possible, Alan, would that be an acceptable alternative?

I'm not entirely happy with this because, as was discussed in an earlier
thread, there are plenty of common idioms where you can expect to
receive a SIGKILL while you're in the middle of dumping core (lots of
userspace setups will send a few HUPs, followed by a few INTs, and if
the process is still there they send KILLs).  However, for my specific
purposes this would be sufficient.

Other ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ