lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 9 Dec 2022 11:13:28 +0800 From: Guo Ren <guoren@...nel.org> To: Palmer Dabbelt <palmer@...osinc.com>, geomatsi@...il.com, philipp.tomsich@...ll.eu, Paul Walmsley <paul.walmsley@...ive.com>, anup@...infault.org Cc: Conor Dooley <conor.dooley@...rochip.com>, heiko@...ech.de, alex@...ti.fr, Christoph Hellwig <hch@....de>, ajones@...tanamicro.com, gary@...yguo.net, jszhang@...nel.org, linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org, guoren@...ux.alibaba.com, apatel@...tanamicro.com Subject: Re: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash On Fri, Dec 9, 2022 at 7:30 AM Palmer Dabbelt <palmer@...osinc.com> wrote: > > On Fri, 18 Nov 2022 12:57:21 PST (-0800), geomatsi@...il.com wrote: > > Hi Guo Ren, > > > > > >> After use_asid_allocator is enabled, the userspace application will > >> crash by stale TLB entries. Because only using cpumask_clear_cpu without > >> local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh. > >> Then set_mm_asid would cause the user space application to get a stale > >> value by stale TLB entry, but set_mm_noasid is okay. > > > > ... [snip] > > > >> + /* > >> + * The mm_cpumask indicates which harts' TLBs contain the virtual > >> + * address mapping of the mm. Compared to noasid, using asid > >> + * can't guarantee that stale TLB entries are invalidated because > >> + * the asid mechanism wouldn't flush TLB for every switch_mm for > >> + * performance. So when using asid, keep all CPUs footmarks in > >> + * cpumask() until mm reset. > >> + */ > >> + cpumask_set_cpu(cpu, mm_cpumask(next)); > >> + if (static_branch_unlikely(&use_asid_allocator)) { > >> + set_mm_asid(next, cpu); > >> + } else { > >> + cpumask_clear_cpu(cpu, mm_cpumask(prev)); > >> + set_mm_noasid(next); > >> + } > >> } > > > > I observe similar user-space crashes on my SMP systems with enabled ASID. > > My attempt to fix the issue was a bit different, see the following patch: > > > > https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/ > > > > In brief, the idea was borrowed from flush_icache_mm handling: > > - keep track of CPUs not running the task > > - perform per-ASID TLB flush on such CPUs only if the task is switched there > > That way looks better to me: leaking hartids in the ASID allocator might > make the crashes go away, but it's just going to end up trending towards > flushing everything and that doesn't seem like the right long-term > solution. The penalty in switch_mm is too heavy!!! - If the system has multiple NUMA nodes, it will cause switch_mm_fast flush unnecessary harts. - If flush_range is just 1 entry, it would case flush_tlb_all_asid. switch_mm_fast: csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | ((cntx & asid_mask) << SATP_ASID_SHIFT) | satp_mode); if (need_flush_tlb) local_flush_tlb_all(); +#ifdef CONFIG_SMP + else { + cpumask_t *mask = &mm->context.tlb_stale_mask;+ + + if (cpumask_test_cpu(cpu, mask)) { + cpumask_clear_cpu(cpu, mask); + local_flush_tlb_all_asid(cntx & asid_mask); // penalty in switch_mm fast path + } + } +#endif And See: static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start, unsigned long size, unsigned long stride) { + struct cpumask *pmask = &mm->context.tlb_stale_mask; struct cpumask *cmask = mm_cpumask(mm); unsigned int cpuid; bool broadcast; @@ -44,6 +29,15 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start, if (static_branch_unlikely(&use_asid_allocator)) { unsigned long asid = atomic_long_read(&mm->context.id); + /* + * TLB will be immediately flushed on harts concurrently + * executing this MM context. TLB flush on other harts + * is deferred until this MM context migrates there. + */ + cpumask_setall(pmask); ^^^^^^^^^^^^^^^^^^^^^^^ It would flush all harts for all NUMA nodes!!! Most of them deferred to switch_mm_fast. The penalty contains unnecessary harts! + cpumask_clear_cpu(cpuid, pmask); + cpumask_andnot(pmask, pmask, cmask); Please reconsider a bit, and make a smart decision. Just penalty the harts who touched the mm, not all. And only flush the whole TLB when some entries are needed. The __sbi_tlb_flush_range is the slow path; keeping the fast path performance is worth more than improving a slow one. > > So I've got that one on for-next, sorry I missed it before. > > Thanks! > > > > > Your patch also works fine in my tests fixing those crashes. I have a > > question though, regarding removed cpumask_clear_cpu. How CPUs no more > > running the task are removed from its mm_cpumask ? If they are not > > removed, then flush_tlb_mm/flush_tlb_page will broadcast unnecessary > > TLB flushes to those CPUs when ASID is enabled. > > > > Regards, > > Sergey -- Best Regards Guo Ren
Powered by blists - more mailing lists