lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <878qmpwl0i.fsf@linux.dev>
Date: Wed, 21 May 2025 17:23:57 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Jann Horn <jannh@...gle.com>
Cc: Hugh Dickins <hughd@...gle.com>,  Andrew Morton
 <akpm@...ux-foundation.org>,  Peter Zijlstra <peterz@...radead.org>,
  linux-kernel@...r.kernel.org,  linux-mm@...ck.org,  Will Deacon
 <will@...nel.org>,  "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,  Nick
 Piggin <npiggin@...il.com>,  linux-arch@...r.kernel.org
Subject: Re: [PATCH v5] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP
 vmas into free_pgtables()

Jann Horn <jannh@...gle.com> writes:

> On Tue, May 20, 2025 at 11:50 PM Roman Gushchin
> <roman.gushchin@...ux.dev> wrote:
>> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
>> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
>> race between munmap() and unmap_mapping_range(). However it added some
>> overhead to other paths where tlb_vma_end() is used, but vmas are not
>> removed, e.g. madvise(MADV_DONTNEED).
>>
>> Fix this by moving the tlb flush out of tlb_end_vma() into new
>> tlb_flush_vmas() called from free_pgtables(), somewhat similar to the
>> stable version of the original commit:
>> commit 895428ee124a ("mm: Force TLB flush for PFNMAP mappings before
>> unlink_file_vma()").
>>
>> Note, that if tlb->fullmm is set, no flush is required, as the whole
>> mm is about to be destroyed.
>>
>> Signed-off-by: Roman Gushchin <roman.gushchin@...ux.dev>
>> Cc: Jann Horn <jannh@...gle.com>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Cc: Will Deacon <will@...nel.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Nick Piggin <npiggin@...il.com>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: linux-arch@...r.kernel.org
>> Cc: linux-mm@...ck.org
>>
>> ---
>> v5:
>>   - tlb_free_vma() -> tlb_free_vmas() to avoid extra checks
>>
>> v4:
>>   - naming/comments update (by Peter Z.)
>>   - check vma->vma->vm_flags in tlb_free_vma() (by Peter Z.)
>>
>> v3:
>>   - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>>
>> v2:
>>   - moved vma_pfn flag handling into tlb.h (by Peter Z.)
>>   - added comments (by Peter Z.)
>>   - fixed the vma_pfn flag setting (by Hugh D.)
>> ---
>>  include/asm-generic/tlb.h | 49 +++++++++++++++++++++++++++++++--------
>>  mm/memory.c               |  2 ++
>>  2 files changed, 41 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
>> index 88a42973fa47..8a8b9535a930 100644
>> --- a/include/asm-generic/tlb.h
>> +++ b/include/asm-generic/tlb.h
>> @@ -58,6 +58,11 @@
>>   *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
>>   *    there's large holes between the VMAs.
>>   *
>> + *  - tlb_free_vmas()
>> + *
>> + *    tlb_free_vmas() marks the start of unlinking of one or more vmas
>> + *    and freeing page-tables.
>> + *
>>   *  - tlb_remove_table()
>>   *
>>   *    tlb_remove_table() is the basic primitive to free page-table directories
>> @@ -399,7 +404,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
>>          * Do not reset mmu_gather::vma_* fields here, we do not
>>          * call into tlb_start_vma() again to set them if there is an
>>          * intermediate flush.
>> +        *
>> +        * Except for vma_pfn, that only cares if there's pending TLBI.
>>          */
>> +       tlb->vma_pfn = 0;
>
> This looks dodgy to me. Can you explain here in more detail why this
> is okay? Looking at current mainline, tlb->vma_pfn is only set to 1
> when tlb_start_vma() calls into tlb_update_vma_flags(); it is never
> set again after tlb_start_vma(), so I don't think it's legal to just
> clear it in the middle of a VMA.
>
> If we had something like this callgraph on a VM_MIXEDMAP mapping, with
> an intermediate TLB flush in the middle of the VM_MIXEDMAP mapping:
>
> tlb_start_vma()
>   [sets tlb->vma_pfn]
> zap_pte_range
>   tlb_flush_mmu_tlbonly
>     __tlb_reset_range
>       [clears tlb->vma_pfn]
> zap_pte_range
>   [zaps more PTEs and queues a pending TLB flush]
> tlb_end_vma()
> free_pgtables
>   tlb_free_vmas
>     [checks for tlb->vma_pfn]
>
> then tlb_free_vmas() will erroneously not do a flush when it should've
> done one, right?

Good catch!

>
> Why does it even matter to you whether tlb->vma_pfn ever gets reset? I
> think more or less at worst you do one extra TLB flush in some case
> involving a munmap() across multiple VMAs including a mix of
> VM_PFNMAP/VM_MIXEDMAP and non-VM_PFNMAP/VM_MIXEDMAP VMAs (which is
> already a fairly weird scenario on its own), with the region being
> operated on large enough or complicated enough that you already did at
> least one TLB flush, and the unmap sufficiently large or with
> sufficient address space around it that page tables are getting freed,
> or something like that? That seems like a scenario in which one more
> flush isn't going to be a big deal.

Agree.

>
> If you wanted to do this properly, I think you could do something like:
>
>  - add another flag tlb->current_vma_pfn that tracks whether the
> current vma is pfnmap/mixedmap
>  - reset tlb->vma_pfn on TLB flush
>  - set tlb->vma_pfn again if a TLB flush is enqueued while
> tlb->current_vma_pfn is true
>
> But that seems way too complicated, so I would just delete these three
> lines from the patch.

Agree, except tlb->vma_pfn needs to be initialized somewhere. This
is primarily why these lines were added based on the feedback to the
original version. However we missed the race you pointed at.

I guess instead tlb->vma_pfn can be initialized in __tlb_gather_mmu().

I'll send out the fixed version shortly.

Thank you!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ