lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <591a70bf016b4317add2d936696abc0f@AcuMS.aculab.com>
Date:   Thu, 21 Sep 2023 14:04:59 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'David Howells' <dhowells@...hat.com>, Jens Axboe <axboe@...nel.dk>
CC:     Al Viro <viro@...iv.linux.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Christoph Hellwig <hch@....de>,
        "Christian Brauner" <christian@...uner.io>,
        Matthew Wilcox <willy@...radead.org>,
        "Jeff Layton" <jlayton@...nel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v5 00/11] iov_iter: Convert the iterator macros into
 inline funcs

From: David Howells
> Sent: 20 September 2023 23:22
...
>  (8) Move the copy-and-csum code to net/ where it can be in proximity with
>      the code that uses it.  This eliminates the code if CONFIG_NET=n and
>      allows for the slim possibility of it being inlined.
> 
>  (9) Fold memcpy_and_csum() in to its two users.
> 
> (10) Move csum_and_copy_from_iter_full() out of line and merge in
>      csum_and_copy_from_iter() since the former is the only caller of the
>      latter.

I thought that the real idea behind these was to do the checksum
at the same time as the copy to avoid loading the data into the L1
data-cache twice - especially for long buffers.
I wonder how often there are multiple iov[] that actually make
it better than just check summing the linear buffer?

I had a feeling that check summing of udp data was done during
copy_to/from_user, but the code can't be the copy-and-csum here
for that because it is missing support form odd-length buffers.

Intel x86 desktop chips can easily checksum at 8 bytes/clock
(But probably not with the current code!).
(I've got ~12 bytes/clock using adox and adcx but that loop
is entirely horrid and it would need run-time patching.
Especially since I think some AMD cpu execute them very slowly.)

OTOH 'rep movs[bq]' copy will copy 16 bytes/clock (32 if the
destination is 32 byte aligned - it pretty much won't be).

So you'd need a csum-and-copy loop that did 16 bytes every
three clocks to get the same throughput for long buffers.
In principle splitting the 'adc memory' into two instructions
is the same number of u-ops - but I'm sure I've tried to do
that and failed and the extra memory write can happen in
parallel with everything else.
So I don't think you'll get 16 bytes in two clocks - but you
might get it is three.

OTOH for a cpu where memcpy is code loop summing the data in
the copy loop is likely to be a gain.

But I suspect doing the checksum and copy at the same time
got 'all to complicated' to actually implement fully.
With most modern ethernet chips checksumming receive pacakets
does it really get used enough for the additional complexity?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ