[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgXvRKwsHUjA9T9Tw6n5x1pCO6B+4kk0GAx+oQ5qhUyRw@mail.gmail.com>
Date: Fri, 10 Feb 2023 09:47:28 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Dave Chinner <david@...morbit.com>
Cc: Stefan Metzmacher <metze@...ba.org>, Jens Axboe <axboe@...nel.dk>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux API Mailing List <linux-api@...r.kernel.org>,
io-uring <io-uring@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Al Viro <viro@...iv.linux.org.uk>,
Samba Technical <samba-technical@...ts.samba.org>
Subject: Re: copy on write for splice() from file to pipe?
On Fri, Feb 10, 2023 at 9:23 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> And when it comes to networking, in general things like TCP checksums
> etc should be ok even with data that isn't stable. When doing things
> by hand, networking should always use the "copy-and-checksum"
> functions that do the checksum while copying (so even if the source
> data changes, the checksum is going to be the checksum for the data
> that was copied).
>
> And in many (most?) smarter network cards, the card itself does the
> checksum, again on the data as it is transferred from memory.
>
> So it's not like "networking needs a stable source" is some really
> _fundamental_ requirement for things like that to work.
>
> But it may well be that we have situations where some network driver
> does the checksumming separately from then copying the data.
Ok, so I decided to try to take a look.
Somebody who actually does networking (and drivers in particular)
should probably check this, but it *looks* like the IPv4 TCP case
(just to pick the ony I looked at) gores through
tcp_sendpage_locked(), which does
if (!(sk->sk_route_caps & NETIF_F_SG))
return sock_no_sendpage_locked(sk, page, offset, size, flags);
which basically says "if you can't handle fragmented socket buffers,
do that 'no_sendpage' case".
So that will basically end up just falling back to a kernel
'sendmsg()', which does a copy and then it's stable.
But for the networks that *can* handle fragmented socket buffers, it
then calls do_tcp_sendpages() instead, which just creates a skb
fragment of the page (with tcp_build_frag()).
I wonder if that case should just require NETIF_F_HW_CSUM?
Linus
Powered by blists - more mailing lists