lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHk-=wgycEJkRApK+Cd2BOuy9Wqb054zN8FO1JY21q6Ao4qRNQ@mail.gmail.com>
Date:   Wed, 17 Mar 2021 16:13:32 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Anton Blanchard <anton@...abs.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH v2] Increase page and bit waitqueue hash size

On Wed, Mar 17, 2021 at 3:23 PM Nicholas Piggin <npiggin@...il.com> wrote:
>
> Might take some time to get a system and run tests. We actually had
> difficulty recreating it before this patch too, so it's kind of
> hard to say _that_ was the exact case that previously ran badly and
> is now fixed. We thought just the statistical nature of collisions
> and page / lock contention made things occasionally line up and
> tank.

Yeah, it probably depends a lot on the exact page allocation patterns
and just bad luck.  Plus the actual spinlocks will end up with false
sharing anyway, so you might need to have both contention on multiple
pages on the same hash queue _and_ perhaps some action on other pages
that just hash close-by so that they make the cache coherency protocol
have to work harder..

And then 99% of the time it all just works out fine, because we only
need to look at the hashed spinlocks when we have any contention on
the page lock in the first place, and that generally should be
something we should avoid for other reasons.

But it is perhaps also a sign that while 256 entries is "ridiculously
small", it might be the case that it's not necessarily much of a real
problem in the end, and I get the feeling that growing it by several
orders of magnitude is overkill.

In fact, the problems we have had with the page wait queue have often
been because we did too much page locking in the first place.

I actually have a couple of patches in my "still thinking about it
tree" (that have been there for six months by now, so maybe not
thinking too _actively_ about it) which remove some overly stupid page
locking.

For example, look at "do_shared_fault()". Currently __do_fault()
always locks the result page. Then if you have a page_mkwrite()
function, we immediately unlock it again. And then we lock it again in
do_page_mkwrite().

Does that particular case matter? I really have no idea. But we
basically lock it twice for what looks like absolutely zero reason
other than calling conventions.

Bah. I'll just attach my three "not very actively thinking about it"
patches here in case anybody else wants to not actively think about
them.

I've actually been running these on my own machine since October,
since the kernel I actually boot on my main desktop always includes my
"thinking about it" patches.

The two first patches should fix the double lock thing I mentioned.

The third patch should be a no-op right now, but the thinking is
outlined in the commit message: why do we even lock pages in
filemap_fault? I'm not actually convinced we need to. I think we could
return an unputodate page unlocked, and instead do the checking at pte
install time (I had some discussions with Kirill about instead doing
it at pte install time, and depending entirely on memory ordering
constraints wrt page truncation that *does* take the page lock, and
then does various other checks - including the page mapcount and
taking the ptl lock - under that lock).

Anyway, I don't mind the "make the hash table larger" regardless of
this, but I do want it to be documented a bit more.

               Linus

View attachment "0001-mm-move-final-page-locking-out-of-__do_fault-helper-.patch" of type "text/x-patch" (2744 bytes)

View attachment "0002-mm-don-t-lock-the-page-only-to-immediately-unlock-it.patch" of type "text/x-patch" (2276 bytes)

View attachment "0003-mm-do_cow_fault-does-not-need-the-source-page-to-be-.patch" of type "text/x-patch" (1984 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ