lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 25 Dec 2016 13:51:17 -0800
From:   Linus Torvalds <>
To:     Nicholas Piggin <>
Cc:     Dave Hansen <>,
        Bob Peterson <>,
        Linux Kernel Mailing List <>,
        Steven Whitehouse <>,
        Andrew Lutomirski <>,
        Andreas Gruenbacher <>,
        Peter Zijlstra <>,
        linux-mm <>,
        Mel Gorman <>
Subject: Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for
 a page bit

On Sat, Dec 24, 2016 at 7:00 PM, Nicholas Piggin <> wrote:
> Add a new page flag, PageWaiters, to indicate the page waitqueue has
> tasks waiting. This can be tested rather than testing waitqueue_active
> which requires another cacheline load.

Ok, I applied this one too. I think there's room for improvement, but
I don't think it's going to help to just wait another release cycle
and hope something happens.

Example room for improvement from a profile of unlock_page():

   46.44 │      lock   andb $0xfe,(%rdi)
   34.22 │      mov    (%rdi),%rax

this has the old "do atomic op on a byte, then load the whole word"
issue that we used to have with the nasty zone lookup code too. And it
causes a horrible pipeline hickup because the load will not forward
the data from the (partial) store.

 Its' really a misfeature of our asm optimizations of the atomic bit
ops. Using "andb" is slightly smaller, but in this case in particular,
an "andq" would be a ton faster, and the mask still fits in an imm8,
so it's not even hugely larger.

But it might also be a good idea to simply use a "cmpxchg" loop here.
That also gives atomicity guarantees that we don't have with the
"clear bit and then load the value".

Regardless, I think this is worth more people looking at and testing.
And merging it is probably the best way for that to happen.


Powered by blists - more mailing lists