lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-L3aXM17=hsJBoauWJ6Dqq16ykcnv8sg-Fn_Td_FsOafA@mail.gmail.com>
Date:   Sat, 23 Sep 2023 08:59:25 +0200
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     David Howells <dhowells@...hat.com>
Cc:     David Laight <David.Laight@...lab.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Jens Axboe <axboe@...nel.dk>,
        Al Viro <viro@...iv.linux.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Christoph Hellwig <hch@....de>,
        Christian Brauner <christian@...uner.io>,
        Matthew Wilcox <willy@...radead.org>,
        Jeff Layton <jlayton@...nel.org>,
        linux-fsdevel@...r.kernel.org, linux-block@...r.kernel.org,
        linux-mm@...ck.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 00/11] iov_iter: Convert the iterator macros into
 inline funcs

On Fri, Sep 22, 2023 at 2:01 PM David Howells <dhowells@...hat.com> wrote:
>
> David Laight <David.Laight@...LAB.COM> wrote:
>
> > >  (8) Move the copy-and-csum code to net/ where it can be in proximity with
> > >      the code that uses it.  This eliminates the code if CONFIG_NET=n and
> > >      allows for the slim possibility of it being inlined.
> > >
> > >  (9) Fold memcpy_and_csum() in to its two users.
> > >
> > > (10) Move csum_and_copy_from_iter_full() out of line and merge in
> > >      csum_and_copy_from_iter() since the former is the only caller of the
> > >      latter.
> >
> > I thought that the real idea behind these was to do the checksum
> > at the same time as the copy to avoid loading the data into the L1
> > data-cache twice - especially for long buffers.
> > I wonder how often there are multiple iov[] that actually make
> > it better than just check summing the linear buffer?
>
> It also reduces the overhead for finding the data to checksum in the case the
> packet gets split since we're doing the checksumming as we copy - but with a
> linear buffer, that's negligible.
>
> > I had a feeling that check summing of udp data was done during
> > copy_to/from_user, but the code can't be the copy-and-csum here
> > for that because it is missing support form odd-length buffers.
>
> Is there a bug there?
>
> > Intel x86 desktop chips can easily checksum at 8 bytes/clock
> > (But probably not with the current code!).
> > (I've got ~12 bytes/clock using adox and adcx but that loop
> > is entirely horrid and it would need run-time patching.
> > Especially since I think some AMD cpu execute them very slowly.)
> >
> > OTOH 'rep movs[bq]' copy will copy 16 bytes/clock (32 if the
> > destination is 32 byte aligned - it pretty much won't be).
> >
> > So you'd need a csum-and-copy loop that did 16 bytes every
> > three clocks to get the same throughput for long buffers.
> > In principle splitting the 'adc memory' into two instructions
> > is the same number of u-ops - but I'm sure I've tried to do
> > that and failed and the extra memory write can happen in
> > parallel with everything else.
> > So I don't think you'll get 16 bytes in two clocks - but you
> > might get it is three.
> >
> > OTOH for a cpu where memcpy is code loop summing the data in
> > the copy loop is likely to be a gain.
> >
> > But I suspect doing the checksum and copy at the same time
> > got 'all to complicated' to actually implement fully.
> > With most modern ethernet chips checksumming receive pacakets
> > does it really get used enough for the additional complexity?
>
> You may be right.  That's more a question for the networking folks than for
> me.  It's entirely possible that the checksumming code is just not used on
> modern systems these days.
>
> Maybe Willem can comment since he's the UDP maintainer?

Perhaps these days it is more relevant to embedded systems than high
end servers.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ