linux-kernel - Re: sys_write() racy for multi-threaded append?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070309145920.GJ6209@kvack.org>
Date:	Fri, 9 Mar 2007 09:59:20 -0500
From:	Benjamin LaHaise <bcrl@...ck.org>
To:	"Michael K. Edwards" <medwards.linux@...il.com>
Cc:	Eric Dumazet <dada1@...mosbay.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: sys_write() racy for multi-threaded append?

On Fri, Mar 09, 2007 at 04:19:55AM -0800, Michael K. Edwards wrote:
> On 3/8/07, Benjamin LaHaise <bcrl@...ck.org> wrote:
> >Any number of things can cause a short write to occur, and rewinding the
> >file position after the fact is just as bad.  A sane app has to either
> >serialise the writes itself or use a thread safe API like pwrite().
> 
> Not on a pipe/FIFO.  Short writes there are flat out verboten by
> 1003.1 unless O_NONBLOCK is set.  (Not that f_pos is interesting on a
> pipe except as a "bytes sent" indicator  -- and in the multi-threaded
> scenario, if you do the speculative update that I'm suggesting, you
> can't 100% trust it unless you ensure that you are not in
> mid-read/write in some other thread at the moment you sample f_pos.
> But that doesn't make it useless.)

Writes to a pipe/FIFO are atomic, so long as they fit within the pipe buffer 
size, while f_pos on a pipe is undefined -- what exactly is the issue here?  
The semantics you're assuming are not defined by POSIX.  Heck, even looking 
at a man page for one of the *BSDs states "Some devices are incapable of 
seeking.  The value of the pointer associated with such a device is 
undefined."  What part of undefined is problematic?

> As to what a "sane app" has to do: it's just not that unusual to write
> application code that treats a short read/write as a catastrophic
> error, especially when the fd is of a type that is known never to
> produce a short read/write unless something is drastically wrong.  For
> instance, I bomb on short write in audio applications where the driver
> is known to block until enough bytes have been read/written, period.
> When switching from reading a stream of audio frames from thread A to
> reading them from thread B, I may be willing to omit app
> serialization, because I can tolerate an imperfect hand-off in which
> thread A steals one last frame after thread B has started reading --
> as long as the fd doesn't get screwed up.  There is no reason for the
> generic sys_read code to leave a race open in which the same frame is
> read by both threads and a hardware buffer overrun results later.

I hope I don't have to run any of your software.  Short writes can and do 
happen because of a variety of reasons: signals, memory allocation failures, 
quota being exceeded....  These are all error conditions the kernel has to 
provide well defined semantics for, as well behaved applications will try 
to handle them gracefully.

> In short, I'm not proposing that the kernel perfectly serialize
> concurrent reads and writes to arbitrary fd types.  I'm proposing that
> it not do something blatantly stupid and easily avoided in generic
> code that makes it impossible for any fd type to guarantee that, after
> 10 successful pipelined 100-byte reads or writes, f_pos will have
> advanced by 1000.

The semantics you're looking for are defined for regular files with 
O_APPEND.  Anything else is asking for synchronization that other 
applications do not require and do not desire.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <zyntrop@...ck.org>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/