[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091220161422.GH32739@1wt.eu>
Date: Sun, 20 Dec 2009 17:14:22 +0100
From: Willy Tarreau <w@....eu>
To: Davide Libenzi <davidel@...ilserver.org>
Cc: Nikolai ZHUBR <zhubr@...l.ru>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: epoll'ing tcp sockets for reading
Hi Davide,
On Sun, Dec 20, 2009 at 07:54:09AM -0800, Davide Libenzi wrote:
> On Sun, 20 Dec 2009, Nikolai ZHUBR wrote:
>
> > Sunday, December 20, 2009, 1:56:22 AM, Davide Libenzi wrote:
> > [trim]
> > > The kernel cannot make decisions based on something whose knowledge is
> > > userspace bound.
> > I didn't mean that. I just meant it would be usefull to let the caller
> > of epoll know also the size of data related to specific EPOLLIN event in
> > some "atomic" manner immediately, because the kernel probably knows this
> > size already.
> > The same thing can approximately be "emulated" by requesting FIOREAD for
> > all EPOLLIN-ready sockets just after epoll returns, before any other work.
> > It just would look not very elegant IMHO.
>
> No such a thing of "atomic matter", since by the time you read the event,
> more data might have come. It's just flawed, you see that?
I think that what Nikolai meant was the ability to wake up as soon as
there are *at least* XXX bytes ready. But while I can understand why
it would in theory save some code, in practice he would still have to
properly handle corner cases, which would defeat the original purpose
of his modification :
- if he waits for larger data than the socket buffer can handle, he
will never wake up ;
- if my memory serves me right, the copy_and_cksum() code only knows
whether a segment is correct during its transfer to userland, which
means that epoll() could very well wake up with XXX apparent bytes
ready, but the read would fail before XXX due to an invalid checksum
on an intermediate segment. So the code would still have to take
care of that situation anyway.
The last point implies the complete implementation of the code he wants
to avoid anyway, and the first one implies it will be hard to know when
this would work and when this would not. This means that while at first
glance this behaviour could be useful, it would in practice be useless.
Regards,
Willy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists