lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250302114951.6eff96d7@pumpkin>
Date: Sun, 2 Mar 2025 11:49:51 +0000
From: David Laight <david.laight.linux@...il.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: Hannes Reinecke <hare@...e.de>, Christoph Hellwig <hch@....de>, Sagi
 Grimberg <sagi@...mberg.me>, Chaitanya Kulkarni <kch@...dia.com>,
 linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvmet-tcp: switch to using the crc32c library

On Wed, 26 Feb 2025 19:01:22 +0000
Eric Biggers <ebiggers@...nel.org> wrote:

...
> I have patches for nvme-tls almost ready too.  Just been taking my time since
> I've been updating all other users of "crc32" and "crc32c" in the kernel too.
> And I need to decide what to do about skb_copy_and_hash_datagram_iter().

I've wondered if any of the 'copy and xxx' functions are actually worth the
extra complexity they add.

The (non-Atom) Intel cpu will copy at 32 bytes/clock provided the destination
is 32 byte aligned (so for an skb copy you may want to copy a few bytes of
'headroom' to align the copy) (I'm not sure how any other cpu behave).

The 'and xxx' algorithm is likely to run faster without having to worry
about writes. May cpu can do more than 1 read/clock, but only one write.

I guess the main benefit is for buffers that are larger than the l1-cache
(or half the cache size if you do the copy first).

It is likely worse for the 'iter' functions (which scatter-gather copy a
linear kernel buffer). They have to allow for the unusual case of multiple
fragments - and I'd guess the initial fragments are likely to be short.

Although I'm not at all sure of the point of doing the IP checksum with
the user copy. My guess is it helped NFS (8k UDP datagrams).
These days most high performance ethernet hardware supports checksum offload.
So RX UDP datagrams (which probably rarely matter) have a valid checksum
and there is no point making send() checksum the transmit data.

I ought to double check that the TX data is always checksummed in send()
I don't remember a conditional - and you pretty much never need it.
UDP TX are going to be short (no userspace NFS) and the normal path transmits
on the callers stack - so the data is likely to be in the right cache if
the checksum is needed.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ