lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 1 Nov 2021 08:45:24 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Linux-MM <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Nick Piggin <npiggin@...il.com>, x86@...nel.org
Subject: Re: [PATCH] mm: use correct VMA flags when freeing page-tables



> On Nov 1, 2021, at 12:28 AM, Hugh Dickins <hughd@...gle.com> wrote:
> 
> On Thu, 21 Oct 2021, Nadav Amit wrote:
> 
>> From: Nadav Amit <namit@...are.com>
>> 
>> Consistent use of the mmu_gather interface requires a call to
>> tlb_start_vma() and tlb_end_vma() for each VMA. free_pgtables() does not
>> follow this pattern.
>> 
>> Certain architectures need tlb_start_vma() to be called in order for
>> tlb_update_vma_flags() to update the VMA flags (tlb->vma_exec and
>> tlb->vma_huge), which are later used for the proper TLB flush to be
>> issued. Since tlb_start_vma() is not called, this can lead to the wrong
>> VMA flags being used when the flush is performed.
>> 
>> Specifically, the munmap syscall would call unmap_region(), which unmaps
>> the VMAs and then frees the page-tables. A flush is needed after
>> the page-tables are removed to prevent page-walk caches from holding
>> stale entries, but this flush would use the flags of the VMA flags of
>> the last VMA that was flushed. This does not appear to be right.
>> 
>> Use tlb_start_vma() and tlb_end_vma() to prevent this from happening.
>> This might lead to unnecessary calls to flush_cache_range() on certain
>> arch's. If needed, a new flag can be added to mmu_gather to indicate
>> that the flush is not needed.
>> 
>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>> Cc: Andrew Cooper <andrew.cooper3@...rix.com>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Andy Lutomirski <luto@...nel.org>
>> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Cc: Thomas Gleixner <tglx@...utronix.de>
>> Cc: Will Deacon <will@...nel.org>
>> Cc: Yu Zhao <yuzhao@...gle.com>
>> Cc: Nick Piggin <npiggin@...il.com>
>> Cc: x86@...nel.org
>> Signed-off-by: Nadav Amit <namit@...are.com>
>> ---
>> mm/memory.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 12a7b2094434..056fbfdd3c1f 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -412,6 +412,8 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
>> 		unlink_anon_vmas(vma);
>> 		unlink_file_vma(vma);
>> 
>> +		tlb_start_vma(tlb, vma);
>> +
>> 		if (is_vm_hugetlb_page(vma)) {
>> 			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
>> 				floor, next ? next->vm_start : ceiling);
>> @@ -429,6 +431,8 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
>> 			free_pgd_range(tlb, addr, vma->vm_end,
>> 				floor, next ? next->vm_start : ceiling);
>> 		}
>> +
>> +		tlb_end_vma(tlb, vma);
>> 		vma = next;
>> 	}
>> }
>> -- 
>> 2.25.1
> 
> No.
> 
> This is an experiment to see whether reviewers look at a wider context
> than is seen in the patch itself?  Let's take a look:
> 
> 		tlb_start_vma(tlb, vma);
> 
> 		if (is_vm_hugetlb_page(vma)) {
> 			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
> 				floor, next ? next->vm_start : ceiling);
> 		} else {
> 			/*
> 			 * Optimization: gather nearby vmas into one call down
> 			 */
> 			while (next && next->vm_start <= vma->vm_end + PMD_SIZE
> 			       && !is_vm_hugetlb_page(next)) {
> 				vma = next;
> 				next = vma->vm_next;
> 				unlink_anon_vmas(vma);
> 				unlink_file_vma(vma);
> 			}
> 			free_pgd_range(tlb, addr, vma->vm_end,
> 				floor, next ? next->vm_start : ceiling);
> 		}
> 
> 		tlb_end_vma(tlb, vma);
> 		vma = next;
> 
> So, the vma may well have changed in between the new tlb_start_vma()
> and tlb_end_vma(): which defeats the intent of the patch.

Indeed, I made a an embarrassing bug. Having said that, I do not
understand the experiment and whether I conducted it or was the
object of it.

> 
> And I doubt that optimization should be dropped to suit the patch:
> you admit to limited understanding of those architectures which need
> tlb_start_vma(), me too; but they seem to have survived many years
> without it there in free_pgtables(), and I think that tlb_start_vma()
> is for when freeing pages, not for when freeing page tables.  Surely
> all architectures have to accommodate the fact that a single page
> table can be occupied by many different kinds of vma.

When it comes to TLB flushing, the assumption that if something is
in the code for many years it must be working is questionable, and
I have already encountered (and fixed) many such cases before.

Freeing page-tables, as I mentioned before, is needed for the
invalidation the page-walk caches after the page-tables are dropped,
if a speculative page-walk takes place. I can post v2 and fix my
embarrassing bug. I am not going to force this patch - I just
encountered the issue as I was modifying a different piece of code
and the behavior seems very inconsistent.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ