lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 06 Jun 2022 13:48:00 -0700 From: "Andy Lutomirski" <luto@...nel.org> To: "Nadav Amit" <nadav.amit@...il.com>, "Dave Hansen" <dave.hansen@...ux.intel.com> Cc: "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>, "Nadav Amit" <namit@...are.com>, "Peter Zijlstra (Intel)" <peterz@...radead.org>, "Ingo Molnar" <mingo@...nel.org>, "Thomas Gleixner" <tglx@...utronix.de>, "the arch/x86 maintainers" <x86@...nel.org> Subject: Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible On Mon, Jun 6, 2022, at 11:01 AM, Nadav Amit wrote: > From: Nadav Amit <namit@...are.com> > > On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly > contended and reading it should (arguably) be avoided as much as > possible. > > Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally, > even when it is not necessary (e.g., the mm was already switched). > This is wasteful. > > Moreover, one of the existing optimizations is to read mm's tlb_gen to > see if there are additional in-flight TLB invalidations and flush the > entire TLB in such a case. However, if the request's tlb_gen was already > flushed, the benefit of checking the mm's tlb_gen is likely to be offset > by the overhead of the check itself. > > Running will-it-scale with tlb_flush1_threads show a considerable > benefit on 56-core Skylake (up to +24%): Acked-by: Andy Lutomirski <luto@...nel.org> But... I'm suspicious that the analysis is missing something. Under this kind of workload, there are a whole bunch of flushes being initiated, presumably in parallel. Each flush does an RMW on mm_tlb_gen (which will make the cacheline exclusive on the initiating CPU). And each flush sends out an IPI, and the IPI handler reads mm_tlb_gen (which makes the cacheline shared) when it updates the local tlb_gen. So you're doing (at least!) an E->S and S->E transition per flush. Your patch doesn't change this. But your patch does add a whole new case in which the IPI handler simply doesn't flush! I think it takes either quite a bit of racing or a well-timed context switch to hit that case, but, if you hit it, then you skip a flush and you skip the read of mm_tlb_gen. Have you tested what happens if you do something like your patch but you also make the mm_tlb_gen read unconditional? I'm curious if there's more to the story than you're seeing. You could also contemplate a somewhat evil hack in which you don't read mm_tlb_gen even if you *do* flush and instead use f->new_tlb_gen. That would potentially do a bit of extra flushing but would avoid the flush path causing the E->S transition. (Which may be of dubious value for real workloads, since I don't think there's a credible way to avoid having context switches read mm_tlb_gen.) --Andy
Powered by blists - more mailing lists