linux-kernel - Re: [PATCH] nvmet-tcp: switch to using the crc32c library

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250302114951.6eff96d7@pumpkin>
Date: Sun, 2 Mar 2025 11:49:51 +0000
From: David Laight <david.laight.linux@...il.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: Hannes Reinecke <hare@...e.de>, Christoph Hellwig <hch@....de>, Sagi
 Grimberg <sagi@...mberg.me>, Chaitanya Kulkarni <kch@...dia.com>,
 linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvmet-tcp: switch to using the crc32c library

On Wed, 26 Feb 2025 19:01:22 +0000
Eric Biggers <ebiggers@...nel.org> wrote:

...
> I have patches for nvme-tls almost ready too.  Just been taking my time since
> I've been updating all other users of "crc32" and "crc32c" in the kernel too.
> And I need to decide what to do about skb_copy_and_hash_datagram_iter().

I've wondered if any of the 'copy and xxx' functions are actually worth the
extra complexity they add.

The (non-Atom) Intel cpu will copy at 32 bytes/clock provided the destination
is 32 byte aligned (so for an skb copy you may want to copy a few bytes of
'headroom' to align the copy) (I'm not sure how any other cpu behave).

The 'and xxx' algorithm is likely to run faster without having to worry
about writes. May cpu can do more than 1 read/clock, but only one write.

I guess the main benefit is for buffers that are larger than the l1-cache
(or half the cache size if you do the copy first).

It is likely worse for the 'iter' functions (which scatter-gather copy a
linear kernel buffer). They have to allow for the unusual case of multiple
fragments - and I'd guess the initial fragments are likely to be short.

Although I'm not at all sure of the point of doing the IP checksum with
the user copy. My guess is it helped NFS (8k UDP datagrams).
These days most high performance ethernet hardware supports checksum offload.
So RX UDP datagrams (which probably rarely matter) have a valid checksum
and there is no point making send() checksum the transmit data.

I ought to double check that the TX data is always checksummed in send()
I don't remember a conditional - and you pretty much never need it.
UDP TX are going to be short (no userspace NFS) and the normal path transmits
on the callers stack - so the data is likely to be in the right cache if
the checksum is needed.

	David