[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjNv9NK=6rTNuQUkBfngw-jvP81esrV5ZLq0RBR9qaOuA@mail.gmail.com>
Date: Thu, 16 Nov 2023 17:36:47 -0500
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Howells <dhowells@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>,
kernel test robot <oliver.sang@...el.com>,
oe-lkp@...ts.linux.dev, lkp@...el.com,
linux-kernel@...r.kernel.org,
Christian Brauner <brauner@...nel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
Christian Brauner <christian@...uner.io>,
Matthew Wilcox <willy@...radead.org>,
David Laight <David.Laight@...lab.com>, ying.huang@...el.com,
feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput
-16.9% regression
On Thu, 16 Nov 2023 at 16:13, David Howells <dhowells@...hat.com> wrote:
>
>
> Okay, I disabled RETPOLINE, which seems like it should be the important one.
> With inlined memcpy:
Yeah, your machine really seems to hate the out-of-line call version.
It is also not unlikely that the benchmark is the perfect example of
that kind of "bad memory copy benchmark" where the actual results of
the copy are never used or touched. It's one case that sometimes makes
"rep movs" look (somewhat artificially) good, just because the
optimized rep string will do cacheline copies in L2. So if you never
touch the source or the destination of the copy, it never even gets
brought into the L1.
Linus
Powered by blists - more mailing lists