[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A6EB667E-590C-4B6C-A932-EF7C5F711755@gmail.com>
Date: Wed, 20 Jul 2022 10:25:46 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: Linux MM <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Mike Rapoport <rppt@...ux.ibm.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Peter Xu <peterx@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Nick Piggin <npiggin@...il.com>
Subject: Re: [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be
writable
On Jul 20, 2022, at 8:19 AM, David Hildenbrand <david@...hat.com> wrote:
> On 18.07.22 14:02, Nadav Amit wrote:
>> From: Nadav Amit <namit@...are.com>
>>
>> Anonymous pages might have the dirty bit clear, but this should not
>> prevent mprotect from making them writable if they are exclusive.
>> Therefore, skip the test whether the page is dirty in this case.
>>
>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>> Cc: Andrew Cooper <andrew.cooper3@...rix.com>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Andy Lutomirski <luto@...nel.org>
>> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
>> Cc: David Hildenbrand <david@...hat.com>
>> Cc: Peter Xu <peterx@...hat.com>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Cc: Thomas Gleixner <tglx@...utronix.de>
>> Cc: Will Deacon <will@...nel.org>
>> Cc: Yu Zhao <yuzhao@...gle.com>
>> Cc: Nick Piggin <npiggin@...il.com>
>> Signed-off-by: Nadav Amit <namit@...are.com>
>> ---
>> mm/mprotect.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 34c2dfb68c42..da5b9bf8204f 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>>
>> VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte));
>>
>> - if (pte_protnone(pte) || !pte_dirty(pte))
>> + if (pte_protnone(pte))
>> return false;
>>
>> /* Do we need write faults for softdirty tracking? */
>> @@ -66,7 +66,8 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>> page = vm_normal_page(vma, addr, pte);
>> if (!page || !PageAnon(page) || !PageAnonExclusive(page))
>> return false;
>> - }
>> + } else if (!pte_dirty(pte))
>> + return false;
>>
>> return true;
>> }
>
> When I wrote that code, I was wondering how often that would actually
> happen in practice -- and if we care about optimizing that. Do you have
> a gut feeling in which scenarios this would happen and if we care?
>
> If the page is in the swapcache and was swapped out, you'd be requiring
> a writeback even though nobody modified the page and possibly isn't
> going to do so in the near future.
So here is my due diligence: I did not really encounter a scenario in which
it showed up. When I looked at your code, I assumed this was an oversight
and not a thoughtful decision. For me the issue is more of the discrepancy
between how a certain page is handled before and after it was pages out.
The way that I see it, there is a tradeoff in the way dirty-bit should
be handled:
(1) Writable-clean PTEs introduce some non-negligible overhead.
(2) Marking a PTE dirty speculatively would require a write back.
… But this tradeoff should not affect whether a PTE is writable, i.e.,
mapping the PTE as writable-clean should not cause a writeback. In other
words, if you are concerned about unnecessary writebacks, which I think is a
fair concern, then do not set the dirty-bit. In a later patch I try to avoid
TLB flushes on clean-writable entries that are write-protected.
So I do not think that the writeback you mentioned should be a real issue.
Yet if you think that using the fact that the page is not-dirty is a good
hueristics to avoid future TLB flushes (for P->NP; as I said there is a
solution for RW->RO), or if you are concerned about the cost of
vm_normal_page(), perhaps those are valid concerned (although I do not think
so).
--
[ Regarding (1): After some discussions with Peter and reading more code, I
thought at some point that perhaps avoiding having writable-clean PTE as
much as possible makes sense [*], since setting the dirty-bit costs ~550
cycles and a page fault is not a lot more than 1000. But with all the
mitigations (and after adding IBRS for retbless) page-fault entry is kind of
expensive.
[*] At least on x86 ]
Powered by blists - more mailing lists