linux-kernel - Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whM-cEwAsLtKsf5dYwV7nDTaRv1bUKLVBstMAQBug24uQ@mail.gmail.com>
Date:   Wed, 15 Nov 2023 11:53:08 -0500
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     David Howells <dhowells@...hat.com>
Cc:     kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
        lkp@...el.com, linux-kernel@...r.kernel.org,
        Christian Brauner <brauner@...nel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
        Christian Brauner <christian@...uner.io>,
        Matthew Wilcox <willy@...radead.org>,
        David Laight <David.Laight@...lab.com>, ying.huang@...el.com,
        feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput
 -16.9% regression

On Wed, 15 Nov 2023 at 10:28, David Howells <dhowells@...hat.com> wrote:
>
> But the outcome is a bit variable and the result spaces overlap considerably.
> I certainly don't see a 17% performance reduction.  Now, this may be due to
> hardware differences.  The CPU I'm using is an Intel i3-4170 - which is a few
> years old at this point.

I tried to look at the perf profile changes in the original report,
and very little of it makes sense to me.

Having looked at quite a lot of those in the past (although certainly
less than Oliver) hat's *usually* a result of a test that is unstable.

In this case, though, I think the big difference is

  -11.0  perf-profile.self.cycles-pp.memcpy_orig
  +14.7  perf-profile.self.cycles-pp.copy_page_from_iter_atomic

which is a bit odd. It looks like the old code used to use a regular
out-of-line memcpy (and that machine doesn't have FSRM), and the new
code for some reason does it inline.

I wonder if gcc somehow decided to inline "memcpy()" in
memcpy_from_iter() as a "rep movsb" because of other inlining changes?

[ Goes out to look ]

Yup, I think that's exactly what happened. Gcc seems to decide that it
might be a small memcpy(), and seems to do at least part of it
directly.

So I *think* this all is mainly an artifact of gcc having changed code
generation due to the code re-organization.

         Linus