linux-kernel - Re: [PATCH] sendfile removal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.98.0706010904110.3957@woody.linux-foundation.org>
Date:	Fri, 1 Jun 2007 09:18:58 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"H. Peter Anvin" <hpa@...or.com>
cc:	Eric Dumazet <dada1@...mosbay.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel@...r.kernel.org, cotte@...ibm.com, hugh@...itas.com,
	neilb@...e.de, zanussi@...ibm.com, hch@...radead.org
Subject: Re: [PATCH] sendfile removal

On Fri, 1 Jun 2007, H. Peter Anvin wrote:
> 
> Fair enough.  Unix has traditionally not acknowledged the possibility of
> nonblocking I/O on conventional files, for some odd reason.

It's not odd at all.

If you return EAGAIN, you had better have a way to _wait_ for that EAGAIN 
to go away, otherwise the EAGAIN is just a total waste of time.

So the rule about EAGAIN is very simple:
 (a) the file descriptor must be O_NONBLOCK
 (b) the access must otherwise block
AND
 (c) the condition must be something we can wait for with poll/select

I don't know why people continually ignore that (c) point, even though 
it's obvious and very very important!

If you cannot wait for it, tell me why the kernel should _ever_ return 
EAGAIN? The only option for the user is to just do the operation again 
immediately.

And the thing is, neither poll nor select work on regular files. And no, 
that is _not_ just an implementation issue. It's very fundamental: neither 
poll nor select get the file offset to wait for!

And that file offset is _critical_ for a regular file, in a way it 
obviously is _not_ for a socket, pipe, or other special file. Because 
without knowing the file offset, you cannot know which page you should be 
waiting for!

And no, the file offset is not "f_pos". sendfile(), along with 
pread/pwrite, uses a totally separate file offset, so if select/poll were 
to base their decision on f_pos, they'd be _wrong_.

This really is very fundamental. 

Now, you can argue that you can always just return -EAGAIN anyway, but 
then the calling process will basically be busy-looping, calling 
sendfile() (or splice()) over and over again. That's _horrible_. It's much 
better to just not return EAGAIN, and sleep like a good process should!

So there's a few things to take away from this:

 - regular file access MUST NOT return EAGAIN just because a page isn't 
   in the cache. Doing so is simply a bug. No ifs, buts or maybe's about 
   it!

   Busy-looping is NOT ACCEPTABLE!

 - you *could* make some alternative conventions:

	(a) you could make O_NONBLOCK mean that you'll at least 
	    guarantee that you *start* the IO, and while you never return 
	    EAGAIN, you migth validly return a _partial_ result!

	(b) variation on (a): it's ok to return EAGAIN if _you_ were the 
	    one who started the IO during this particular time aroudn the 
	    loop. But if you find a page that isn't up-to-date yet, and 
	    you didn't start the IO, you *must* wait for it, so that you 
	    end up returning EAGAIN atmost once! Exactly because 
	    busy-looping is simply not acceptable behaviour!

I have to admit that I didn't look at what raw splice() itself does these 
days. I would not be surprised if Jens also didn't realize this very 
fundamental issue. It seems too easy to miss, because people think 
that EAGAIN stands on its own, and don't realize that EAGAIN must be 
paired with select/poll to make sense.

Jens?

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/