linux-kernel - Re: [BUG?] X86 arch_tlbbatch_flush() seems to be lacking mm_tlb_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <2C85D898-7D10-4E5C-9B2C-017B202C7026@gmail.com>
Date:   Sun, 16 Oct 2022 08:31:42 +0300
From:   Nadav Amit <nadav.amit@...il.com>
To:     Linus Torvalds <torvalds@...uxfoundation.org>
Cc:     Jann Horn <jannh@...gle.com>, Andy Lutomirski <luto@...nel.org>,
        Linux-MM <linux-mm@...ck.org>, Mel Gorman <mgorman@...e.de>,
        Rik van Riel <riel@...hat.com>,
        kernel list <linux-kernel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>,
        Ingo Molnar <mingo@...nel.org>,
        Sasha Levin <sasha.levin@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [BUG?] X86 arch_tlbbatch_flush() seems to be lacking
 mm_tlb_flush_nested() integration

On Oct 16, 2022, at 2:47 AM, Linus Torvalds <torvalds@...uxfoundation.org> wrote:

> On Fri, Oct 14, 2022 at 8:51 PM Nadav Amit <nadav.amit@...il.com> wrote:
>> Unless I am missing something, flush_tlb_batched_pending() is would be
>> called and do the flushing at this point, no?
> 
> Ahh, yes.
> 
> That seems to be doing the right thing, although looking a bit more at
> it, I think it might be improved.
> 
> At least in the zap_pte_range() case, instead of doing a synchronous
> TLB flush if there are pending batched flushes, it migth be better if
> flush_tlb_batched_pending() would set the "need_flush_all" bit in the
> mmu_gather structure.
> 
> That would possibly avoid that extra TLB flush entirely - since
> *normally* fzap_page_range() will cause a TLB flush anyway.
> 
> Maybe it doesn't matter.

It seems possible and simple.

But in general, there are still various unnecessary TLB flushes due to the
TLB batching. Specifically, ptep_clear_flush() might flush unnecessarily
when pte_accessible() finds tlb_flush_pending holding a non-zero value.
Worse, the complexity of the code is high.

To simplify the TLB flushing mechanism and eliminate the unnecessary TLB
flushes, it is possible to track the “completed” TLB generation (i.e., one
that was flushed). Tracking pending TLB flushes can be done in VMA- or
page-table granularity instead of mm-grnaulrity to avoid unnecessary flushes
on ptep_clear_flush(). Andy also suggested having a queue of the pending TLB
flushes.

The main problem is that each of the aforementioned enhancements can add
some cache references, and therefore might induce additional overheads. I
sent some patches before [1], which I can revive. The main question is
whether we can prioritize simplicity and unification of the various
TLB-flush batching mechanisms over (probably very small) performance gains.

[1] https://lore.kernel.org/linux-mm/20210131001132.3368247-1-namit@vmware.com/