netdev - Re: Adding Support for SG,GSO,GRO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1291998379.3580.154.camel@edumazet-laptop>
Date:	Fri, 10 Dec 2010 17:26:19 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Michał Mirosław <mirqus@...il.com>
Cc:	David Lamparter <equinox@...c24.net>,
	David Miller <davem@...emloft.net>, bhutchings@...arflare.com,
	srk@...com, netdev@...r.kernel.org, Jens Axboe <axboe@...nel.dk>
Subject: Re: Adding Support for SG,GSO,GRO

Le vendredi 10 décembre 2010 à 17:01 +0100, Michał Mirosław a écrit :
> W dniu 10 grudnia 2010 15:31 użytkownik David Lamparter
> <equinox@...c24.net> napisał:
> > On Fri, Dec 10, 2010 at 03:18:11PM +0100, Michał Mirosław wrote:
> >> I'm trying to understand the dependency because it looks artificial for me.
> >
> > You have the data you want to send in the RAM, somewhere, possibly
> > scattered. The application calls sendfile(). The kernel puts the
> > transmission in the network card's queue, which might already have lots
> > of entries.
> >
> > A millisecond later - an eternity for the CPU - the card decides to do
> > the transmission.
> >
> > However, the data might have changed in the meantime.
> >
> > sendfile() is defined so that it works asynchronously, that means if you
> > change the data while it is in the queue, you get unpredictable results.
> >
> > But, what you should NOT get is packets with an invalid checksum.
> > Whatever data you are sending, it needs to have a correct checksum.
> >
> > Now, if the card does the checksum itself, everything is fine. But what
> > are you supposed to do if the card can't checksum? Call back the kernel
> > at the point where the card does the TX? That's pointless (and racy).
> > Pre-calculate the Checksum at submission time? Doesn't work, you would
> > have to make a copy of the data, so it doesn't change anymore, so the
> > checksum stays correct. But not copying the data is the whole point of
> > sendfile().
> 
> The question is do we really want good checksum for bogus data?

A frame must be sent with good checksum, or you violate very basic
transport rule. You mix several layers here.

>  I
> think that what matters is that good data (it had not changed between
> queuing and sending, and so the checksum does not depend on whether
> hardware or software calculated it) need to be accompanied by good
> checksum. For broken data it would be even better to not send
> anything, but that's not always possible. Bad checksum in this case is
> actually a good thing as it clearly shows that something is broken in
> the sender and avoids accepting the data as valid at the receiving
> end.
> 

a crc checksum is very lazy, and not guarantee the file you received is
consistent. Is a "per frame" checksum, not a "per file" checksum. If you
want, add a MD5 sum and all be fine.

> sendfile() is supposed to replace read()/write() loops to avoid
> copying data buffers. Whatever the optimizations, it has no additional
> cavat that it might corrupt the transfer. Fixing the checksum in case
> data gets corrupted is not the right thing, I think.
> 

You cannot fix the dam thing, or you MUST play VM games, that we cant to
avoid at the very beginning.

> The change of file data might happen if sendfile() submits page from
> pagecache when something writes to the file, or when an application
> modifies vmsplice()-submitted memory. In current kernel sendfile() is
> a wrapper around splice(), and so has a kernel pipe buffer between
> file and the socket. Can this hide the possible data changes?

No. Nothing in sendfile() API states that underlying file cannot change.

If we said : "We dont allow page content to change", then we _must_ add
special logic to detect a buggy application wont change the page while
we compute checksum or between this computation and final DMA to device.
 
That means playing VM games, and that is expensive, particularly in
multi threaded apps.

We instead say that : As we dont want to make this expensive checks, an
application is allowed to write into the page (but what should it, it
would be very stupid no ???), and a sendfile() wont be stopped/stalled
because of invalid checksum (because checksums are done after the page
content is fetched by NIC hardware)

But but but : the receiver of sendfile() get a file with possible
inconsistent content. In order to avoid this, higher level should take
locks (samba does), or make sure file is readonly.

> Unless I totally misunderstood, you say that we accept bogus data to
> being sent using sendfile(). If yes, then we might as well allow
> broken checksum (CPU calculated, before data changed and then was sent
> to network).
> 
> 


> If the splice/sendfile is taken out of the picture, are there any
> other scenarios, when data pages could be changed after
> ndo_start_xmit() entry and before TX DMA completion (between dma_map
> .. dma_unmap)? And is it really what can happen with splice/sendfile?
> 

If you want to perform checksum before DMA to device, then you also
should copy the page into private storage (a bounce buffer), to make
sure checksum wont be invalid.

You add one copy. Your choice, but not ours.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html