lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090108220345.GB24243@1wt.eu>
Date:	Thu, 8 Jan 2009 23:03:45 +0100
From:	Willy Tarreau <w@....eu>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	David Miller <davem@...emloft.net>,
	Jarek Poplawski <jarkao2@...il.com>,
	Ben Mansell <ben@...s.com>, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH] tcp: splice as many packets as possible at once

On Thu, Jan 08, 2009 at 08:44:24PM +0100, Jens Axboe wrote:
> On Thu, Jan 08 2009, Willy Tarreau wrote:
> > Jens,
> > 
> > here's the other patch I was talking about, for better behaviour of
> > non-blocking splice(). Ben Mansell also confirms similar improvements
> > in his tests, where non-blocking splice() initially showed half of
> > read()/write() performance. Ben, would you mind adding a Tested-By
> > line ?
> 
> Looks very good to me, I don't see any valid reason to have that !timeo
> break. So feel free to add my acked-by to that patch and send it to
> Dave.

Done, thanks.

Also, I tried to follow the code path from the userspace splice() call and
the kernel code. While splicing from socket to a pipe seems obvious till
tcp_splice_read(), I have a hard time finding the correct way from the pipe
to the socket.

It *seems* to me that we proceed like this, I'd like someone to confirm :

sys_splice()
  do_splice()
    do_splice_from()
      generic_splice_sendpage()      [ I assumed this from socket_file_ops ]
        splice_from_pipe(,,pipe_to_sendpage)
          __splice_from_pipe(,,pipe_to_sendpage)
            pipe_to_sendpage()
              tcp_sendpage()         [ I assumed this from inet_stream_ops ]
                do_tcp_sendpages()   (if NETIF_F_SG) or sock_no_sendpage()

I hope that I guessed it right because it does not appear obvious to me. It
will help me try to understand better the corruption problem.

Now for the read part, am I wrong tcp_splice_read() will return -EAGAIN
if the pipe is full ? It seems to me that this will be brought up by
splice_to_pipe(), via __tcp_splice_read, tcp_read_sock, tcp_splice_data_recv
and skb_splice_bits.

If so, this can cause serious trouble because if there is data available in
a socket, poll() will return POLLIN. Then we call splice(sock, pipe) but the
pipe is full, so we get -EAGAIN, which in turn makes the application think
that we need to poll again, and since no data has moved, we'll immediately
call splice() again. In my early experiments, I'm pretty sure I already came
across such a situation. Shouldn't we return -EBUSY or -ENOSPC in this case ?
I can work on patches if there is a consensus around those issues, and this
will also help me understand better the splice internals. I just don't want
to work in the wrong direction for nothing.

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ