lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091220161422.GH32739@1wt.eu>
Date:	Sun, 20 Dec 2009 17:14:22 +0100
From:	Willy Tarreau <w@....eu>
To:	Davide Libenzi <davidel@...ilserver.org>
Cc:	Nikolai ZHUBR <zhubr@...l.ru>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: epoll'ing tcp sockets for reading

Hi Davide,

On Sun, Dec 20, 2009 at 07:54:09AM -0800, Davide Libenzi wrote:
> On Sun, 20 Dec 2009, Nikolai ZHUBR wrote:
> 
> > Sunday, December 20, 2009, 1:56:22 AM, Davide Libenzi wrote:
> > [trim]
> > > The kernel cannot make decisions based on something whose knowledge is 
> > > userspace bound.
> > I didn't mean that. I just meant it would be usefull to let the caller
> > of epoll know also the size of data related to specific EPOLLIN event in
> > some "atomic" manner immediately, because the kernel probably knows this
> > size already.
> > The same thing can approximately be "emulated" by requesting FIOREAD for
> > all EPOLLIN-ready sockets just after epoll returns, before any other work.
> > It just would look not very elegant IMHO.
> 
> No such a thing of "atomic matter", since by the time you read the event, 
> more data might have come. It's just flawed, you see that?

I think that what Nikolai meant was the ability to wake up as soon as
there are *at least* XXX bytes ready. But while I can understand why
it would in theory save some code, in practice he would still have to
properly handle corner cases, which would defeat the original purpose
of his modification :

  - if he waits for larger data than the socket buffer can handle, he
    will never wake up ;

  - if my memory serves me right, the copy_and_cksum() code only knows
    whether a segment is correct during its transfer to userland, which
    means that epoll() could very well wake up with XXX apparent bytes
    ready, but the read would fail before XXX due to an invalid checksum
    on an intermediate segment. So the code would still have to take
    care of that situation anyway.

The last point implies the complete implementation of the code he wants
to avoid anyway, and the first one implies it will be hard to know when
this would work and when this would not. This means that while at first
glance this behaviour could be useful, it would in practice be useless.

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ