lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Aug 2023 12:16:50 +0200
From:   Alexandre Ghiti <alex@...ti.fr>
To:     Dylan Jhong <dylan@...estech.com>, paul.walmsley@...ive.com,
        palmer@...belt.com, aou@...s.berkeley.edu, ajones@...tanamicro.com,
        alexghiti@...osinc.com, anup@...infault.org, rppt@...nel.org,
        samuel@...lland.org, panqinglin2020@...as.ac.cn,
        sergey.matyukevich@...tacore.com, maz@...nel.org,
        linux-riscv@...ts.infradead.org, conor.dooley@...rochip.com,
        linux-kernel@...r.kernel.org
Cc:     ycliang@...estech.com, x5710999x@...il.com, tim609@...estech.com
Subject: Re: [PATCH 1/1] riscv: Implement arch_sync_kernel_mappings() for
 "preventive" TLB flush

Hi Dylan,

On 07/08/2023 10:23, Dylan Jhong wrote:
> Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> the correct kernel mapping.
>
> The patch implements TLB flushing in arch_sync_kernel_mappings(), ensuring that kernel
> page table mappings created via vmap/vmalloc() are updated before switching MM.
>
> Signed-off-by: Dylan Jhong <dylan@...estech.com>
> ---
>   arch/riscv/include/asm/page.h |  2 ++
>   arch/riscv/mm/tlbflush.c      | 12 ++++++++++++
>   2 files changed, 14 insertions(+)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index b55ba20903ec..6c86ab69687e 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -21,6 +21,8 @@
>   #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
>   #define HUGETLB_PAGE_ORDER      (HPAGE_SHIFT - PAGE_SHIFT)
>   
> +#define ARCH_PAGE_TABLE_SYNC_MASK	PGTBL_PTE_MODIFIED
> +
>   /*
>    * PAGE_OFFSET -- the first address of the first page of memory.
>    * When not using MMU this corresponds to the first free page in
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 77be59aadc73..d63364948c85 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -149,3 +149,15 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
>   	__flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
>   }
>   #endif
> +
> +/*
> + * Since RISC-V is a microarchitecture that allows caching invalid entries in the TLB,
> + * it is necessary to issue a "preventive" SFENCE.VMA to ensure that each core obtains
> + * the correct kernel mapping. arch_sync_kernel_mappings() will ensure that kernel
> + * page table mappings created via vmap/vmalloc() are updated before switching MM.
> + */
> +void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
> +{
> +	if (start < VMALLOC_END && end > VMALLOC_START)


This test is too restrictive, it should catch the range [MODULES_VADDR;  
MODULES_END[ too, sorry I did not notice that at first.


> +		flush_tlb_all();
> +}
> \ No newline at end of file


I have to admit that I *think* both your patch and mine are wrong: one 
of the problem that led to the removal of vmalloc_fault() is the 
possibility for tracing functions to actually allocate vmalloc regions 
in the vmalloc page fault path, which could give rise to nested 
exceptions (see 
https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/).

Here, everytime we allocate a vmalloc region, we send an IPI. If a 
vmalloc allocation happens in this path (if it is traced for example), 
it will give rise to an IPI...and so on.

So I came to the conclusion that the only way to actually fix this issue 
is by resolving the vmalloc faults very early in the page fault path (by 
emitting a sfence.vma on uarch that cache invalid entries), before the 
kernel stack is even accessed. That's the best solution since it would 
completely remove all the preventive sfence.vma in 
flush_cache_vmap()/arch_sync_kernel_mappings(), we would rely on 
faulting which I assume should not happen a lot (?).

I'm implementing this solution, but I'm pretty sure it won't be ready 
for 6.5. In the meantime, we need either your patch or mine to fix your 
issue...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ