lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 25 Dec 2016 13:51:17 -0800 From: Linus Torvalds <torvalds@...ux-foundation.org> To: Nicholas Piggin <npiggin@...il.com> Cc: Dave Hansen <dave.hansen@...ux.intel.com>, Bob Peterson <rpeterso@...hat.com>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Steven Whitehouse <swhiteho@...hat.com>, Andrew Lutomirski <luto@...nel.org>, Andreas Gruenbacher <agruenba@...hat.com>, Peter Zijlstra <peterz@...radead.org>, linux-mm <linux-mm@...ck.org>, Mel Gorman <mgorman@...hsingularity.net> Subject: Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit On Sat, Dec 24, 2016 at 7:00 PM, Nicholas Piggin <npiggin@...il.com> wrote: > Add a new page flag, PageWaiters, to indicate the page waitqueue has > tasks waiting. This can be tested rather than testing waitqueue_active > which requires another cacheline load. Ok, I applied this one too. I think there's room for improvement, but I don't think it's going to help to just wait another release cycle and hope something happens. Example room for improvement from a profile of unlock_page(): 46.44 │ lock andb $0xfe,(%rdi) 34.22 │ mov (%rdi),%rax this has the old "do atomic op on a byte, then load the whole word" issue that we used to have with the nasty zone lookup code too. And it causes a horrible pipeline hickup because the load will not forward the data from the (partial) store. Its' really a misfeature of our asm optimizations of the atomic bit ops. Using "andb" is slightly smaller, but in this case in particular, an "andq" would be a ton faster, and the mask still fits in an imm8, so it's not even hugely larger. But it might also be a good idea to simply use a "cmpxchg" loop here. That also gives atomicity guarantees that we don't have with the "clear bit and then load the value". Regardless, I think this is worth more people looking at and testing. And merging it is probably the best way for that to happen. Linus
Powered by blists - more mailing lists