linux-kernel - Re: [PATCH 2/2] mm/mprotect: do not flush on permission promotion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1952fc7c-fb21-7d0e-661b-afa59b4580e5@redhat.com>
Date:   Thu, 7 Oct 2021 19:07:24 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Nadav Amit <namit@...are.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Nick Piggin <npiggin@...il.com>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 2/2] mm/mprotect: do not flush on permission promotion

On 07.10.21 18:16, Nadav Amit wrote:
> 
>> On Oct 7, 2021, at 5:13 AM, David Hildenbrand <david@...hat.com> wrote:
>>
>> On 25.09.21 22:54, Nadav Amit wrote:
>>> From: Nadav Amit <namit@...are.com>
>>> Currently, using mprotect() to unprotect a memory region or uffd to
>>> unprotect a memory region causes a TLB flush. At least on x86, as
>>> protection is promoted, no TLB flush is needed.
>>> Add an arch-specific pte_may_need_flush() which tells whether a TLB
>>> flush is needed based on the old PTE and the new one. Implement an x86
>>> pte_may_need_flush().
>>> For x86, PTE protection promotion or changes of software bits does
>>> require a flush, also add logic that considers the dirty-bit. Changes to
>>> the access-bit do not trigger a TLB flush, although architecturally they
>>> should, as Linux considers the access-bit as a hint.
>>
>> Is the added LOC worth the benefit? IOW, do we have some benchmark that really benefits from that?
> 
> So you ask whether the added ~10 LOC (net) worth the benefit?

I read  "3 files changed, 46 insertions(+), 1 deletion(-)" to optimize 
something without proof, so I naturally have to ask. So this is just a 
"usually we optimize and show numbers to proof" comment.

> 
> Let’s start with the cost of this patch.
> 
> If you ask about complexity, I think that it is a rather simple
> patch and documented as needed. Please be more concrete if you
> think otherwise.

It is most certainly added complexity, although documented cleanly.

> 
> If you ask about the runtime overhead, my experience is that
> such code, which mostly does bit operations, has negligible cost.
> The execution time of mprotect code, and other similar pieces of
> code, is mostly dominated by walking the page-tables & getting
> the pages (which might require cold or random memory accesses),
> acquiring the locks, and of course the TLB flushes that this
> patch tries to eliminate.

I'm absolutely not concerned about runtime overhead :)

> 
> As for the benefit: TLB flush on x86 of a single PTE has an
> overhead of ~200 cycles. If a TLB shootdown is needed, for instance
> on multithreaded applications, this overhead can grow to few
> microseconds or even more, depending on the number of sockets,
> whether the workload runs in a VM (and worse if CPUs are
> overcommitted) and so on.
> 
> This overhead is completely unnecessary on many occasions. If
> you run mprotect() to add permissions, or as I noted in my case,
> to do something similar using userfaultfd. Note that the
> potentially unnecessary TLB flush/shootdown takes place while
> you hold the mmap-lock for write in the case of mprotect(),
> thereby potentially preventing other threads from making
> progress during that time.
> 
> On my in-development workload it was a considerable overhead
> (I didn’t collect numbers though). Basically, I track dirty
> pages using uffd, and every page-fault that can be easily
> resolved by unprotecting cause a TLB flush/shootdown.

Any numbers would be helpful.

> 
> If you want, I will write a microbenchmarks and give you numbers.
> If you look for further optimizations (although you did not indicate
> so), such as doing the TLB batching from do_mprotect_key(),
> (i.e. batching across VMAs), we can discuss it and apply it on
> top of these patches.

I think this patch itself is sufficient if we can show a benefit; I do 
wonder if existing benchmarks could already show a benefit, I feel like 
they should if this makes a difference. Excessive mprotect() usage 
(protect<>unprotect) isn't something unusual.

-- 
Thanks,

David / dhildenb