lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 21 Sep 2008 04:24:42 -0500
From:	lkml@...garu.com
To:	linux-kernel@...r.kernel.org
Subject: Re: Honoring SO_RCVLOWAT in proto_ops.poll methods

On Sat, Sep 20, 2008 at 06:00:46PM -0500, lkml@...garu.com wrote:
> On Sat, Sep 20, 2008 at 03:21:40PM -0700, David Miller wrote:
> > From: lkml@...garu.com
> > Date: Sat, 20 Sep 2008 16:42:29 -0500
> > 
> > > I have a need for select/poll/epoll_wait to block on sockets which have
> > > unread data sitting in the receive buffer with a quantity less than
> > > specified via setsockopt() w/SO_RCVLOWAT, not less than one like the
> > > current implementation.
> > 
> > If BSD never provided this behavior, such a change is likely
> > to break applications.
> 
> I did a quick look through FreeBSD source on fxr and found this macro:
> http://fxr.watson.org/fxr/source/sys/socketvar.h#L197
> 
> Which is used by the generic socket poll here:
> http://fxr.watson.org/fxr/source/kern/uipc_socket.c#L2731
> 
> You can look throughout that listing and so_rcv.sb_lowat is always what
> is compared against for determining rcv buf readability.
> 
> You might also want to look at the socket(7) man page which implies that
> what Linux currently does is exceptional & incorrect:
> 
>        SO_RCVLOWAT and SO_SNDLOWAT
>               Specify the minimum number of bytes in  the  buffer  until
>               the  socket  layer  will  pass  the  data  to the protocol
>               (SO_SNDLOWAT) or  the  user  on  receiving  (SO_RCVLOWAT).
>               These two values are initialised to 1.  SO_SNDLOWAT is not
>               changeable on Linux (setsockopt fails with the error  ENO-
>               PROTOOPT).   SO_RCVLOWAT  is  changeable  only since Linux
>               2.4.  The select(2) and poll(2) system calls currently  do
>               not  respect  the SO_RCVLOWAT setting on Linux, and mark a
>               socket readable when even a single byte of data is  avail-
>               able.   A subsequent read from the socket will block until
>               SO_RCVLOWAT bytes are available.
> 

I've been working on my application further and finally got around to
testing it with the assumption that poll won't block with regard to
SO_RCVLOWAT, and to my surprise even my recv() calls with MSG_PEEK flags
set are not blocking.  They block without MSG_PEEK, but not with.

Upon further investigation I find in tcp.c tcp_recvmsg() 2.6.26.5:

1306         target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);

...snip...

1371                 if (copied >= target && !sk->sk_backlog.tail)
1372                         break;
1373 
1374                 if (copied) {
1375                         if (sk->sk_err ||
1376                             sk->sk_state == TCP_CLOSE ||
1377                             (sk->sk_shutdown & RCV_SHUTDOWN) ||
1378                             !timeo ||
1379                             signal_pending(current) ||
1380                             (flags & MSG_PEEK))
1381                                 break;
1382                 } else {


So line #1380 drops out without satisfying copied >= target if MSG_PEEK is
set, and if you look at the remainder of the function it's assuming that
it needs to cleanup buffers before waiting for more.  So fixing this guy
is likely not as trivial as fixing poll, since the rest of the function
has to be massaged to not try free things be in MSG_PEEK mode.

Once again, this deviates from FreeBSD behavior.

At this point, for my application to work on Linux without burning CPU like
mad... I basically have to sleep and poll the socket regularly to see if
more data has arrived with the tcp socket ioctl SIOCINQ. :(

Regards,
Vito Caputo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ