[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjrwcJVBJjjqD4cMrYhH=A-fmD0SQJyW7r7noyQAXnBSg@mail.gmail.com>
Date: Fri, 21 Jun 2024 13:53:27 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Matthew Wilcox <willy@...radead.org>
Cc: Christian Brauner <brauner@...nel.org>, Al Viro <viro@...iv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>, "the arch/x86 maintainers" <x86@...nel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, kernel test robot <lkp@...el.com>
Subject: Re: FYI: path walking optimizations pending for 6.11
On Fri, 21 Jun 2024 at 13:04, Matthew Wilcox <willy@...radead.org> wrote:
>
> What I was reacting to in your email was this:
>
> : And on my arm64 machine, it turns out that the best optimization for the
> : load I tested would be to make that hash table smaller to actually be a
> : bit denser in the cache, But that's such a load-dependent optimization
> : that I'm not doing this.
>
> And that's exactly what rosebush does; it starts out incredibly small
> (512 bytes) and then resizes as the buckets overflow. So if you suspect
> that a denser hashtable would give you better performance, then maybe
> it'll help.
Well, I was more going "ok, on the exact load _I_ was running, it
would probably help to use a smaller hash table", but I suspect that
in real life our actual hash tables are better.
My benchmark is somewhat real-world in that yes, I benchmark what I
do. But what I do is ridiculously limited. Using git and building
kernels and running a web browser for email does not require 64GB of
RAM.
But that's what I have in what is now my "small" machine, literally
because I wanted to populate every memory channel.
Not because I needed the size, but because I wanted the memory channel
bandwidth.
IOW, my machines tend to be completely over-specced wrt memory. The
kernel build can use about as many cores as you can throw at it, but
even with multiple trees, and everything cached, and hundreds of
parallel compilers going, I just don't use that much RAM. The kernel
build system is pretty damn lean (ask the poor people who do GUI tools
with C++ and the situation changes, but the kernel build is actually
pretty good on resource use).
So the kernel - very reasonably - provisions me with a big hash table,
because I literally have memory to waste.
And it turns out that since _all_ I do on the arm64 box in particular
(it's headless, so not even a web browser) is to build kernels. I
could "tweak" the config for that.
But while it might benchmark better, it would likely not be better in reality.
I'm going to be on the road this weekend, but if you have something
that you think is past the "debug build" stage and is worth
benchmarking, I can try to run it on my machines next week.
Linus
Powered by blists - more mailing lists