lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14437e403ed8fceacafe0a89521d3b731211156e.camel@physik.fu-berlin.de>
Date: Tue, 12 Aug 2025 18:43:31 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: Nadav Amit <namit@...are.com>, Peter Zijlstra <peterz@...radead.org>, 
 Borislav Petkov	 <bp@...en8.de>, Andy Lutomirski <luto@...nel.org>, Ingo
 Molnar <mingo@...hat.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org, hpa@...or.com, Thomas
 Gleixner	 <tglx@...utronix.de>, Nadav Amit <nadav.amit@...il.com>, Dave
 Hansen	 <dave.hansen@...ux.intel.com>, linux_dti@...oud.com, 
	linux-integrity@...r.kernel.org, linux-security-module@...r.kernel.org, 
	akpm@...ux-foundation.org, kernel-hardening@...ts.openwall.com, 
	linux-mm@...ck.org, will.deacon@....com, ard.biesheuvel@...aro.org, 
	kristen@...ux.intel.com, deneen.t.dock@...el.com, Rick Edgecombe	
 <rick.p.edgecombe@...el.com>, Daniel Borkmann <daniel@...earbox.net>,
 Alexei Starovoitov <ast@...nel.org>, sparclinux
 <sparclinux@...r.kernel.org>, Sam James <sam@...too.org>,  Andreas Larsson
 <andreas@...sler.com>, Anthony Yznaga <anthony.yznaga@...cle.com>
Subject: Re: [PATCH v5 18/23] bpf: Use vmalloc special flag

Hi,

On Thu, 2019-04-25 at 17:11 -0700, Nadav Amit wrote:
> From: Rick Edgecombe <rick.p.edgecombe@...el.com>
> 
> Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
> permissioned memory in vmalloc and remove places where memory was set RW
> before freeing which is no longer needed. Don't track if the memory is RO
> anymore because it is now tracked in vmalloc.
> 
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: Alexei Starovoitov <ast@...nel.org>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> ---
>  include/linux/filter.h | 17 +++--------------
>  kernel/bpf/core.c      |  1 -
>  2 files changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 14ec3bdad9a9..7d3abde3f183 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -20,6 +20,7 @@
>  #include <linux/set_memory.h>
>  #include <linux/kallsyms.h>
>  #include <linux/if_vlan.h>
> +#include <linux/vmalloc.h>
>  
>  #include <net/sch_generic.h>
>  
> @@ -503,7 +504,6 @@ struct bpf_prog {
>  	u16			pages;		/* Number of allocated pages */
>  	u16			jited:1,	/* Is our filter JIT'ed? */
>  				jit_requested:1,/* archs need to JIT the prog */
> -				undo_set_mem:1,	/* Passed set_memory_ro() checkpoint */
>  				gpl_compatible:1, /* Is filter GPL compatible? */
>  				cb_access:1,	/* Is control block accessed? */
>  				dst_needed:1,	/* Do we need dst entry? */
> @@ -733,27 +733,17 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
>  
>  static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
>  {
> -	fp->undo_set_mem = 1;
> +	set_vm_flush_reset_perms(fp);
>  	set_memory_ro((unsigned long)fp, fp->pages);
>  }
>  
> -static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
> -{
> -	if (fp->undo_set_mem)
> -		set_memory_rw((unsigned long)fp, fp->pages);
> -}
> -
>  static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
>  {
> +	set_vm_flush_reset_perms(hdr);
>  	set_memory_ro((unsigned long)hdr, hdr->pages);
>  	set_memory_x((unsigned long)hdr, hdr->pages);
>  }
>  
> -static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
> -{
> -	set_memory_rw((unsigned long)hdr, hdr->pages);
> -}
> -
>  static inline struct bpf_binary_header *
>  bpf_jit_binary_hdr(const struct bpf_prog *fp)
>  {
> @@ -789,7 +779,6 @@ void __bpf_prog_free(struct bpf_prog *fp);
>  
>  static inline void bpf_prog_unlock_free(struct bpf_prog *fp)
>  {
> -	bpf_prog_unlock_ro(fp);
>  	__bpf_prog_free(fp);
>  }
>  
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index ff09d32a8a1b..c605397c79f0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -848,7 +848,6 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
>  	if (fp->jited) {
>  		struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
>  
> -		bpf_jit_binary_unlock_ro(hdr);
>  		bpf_jit_binary_free(hdr);
>  
>  		WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
> -- 
> 2.17.1
> 
> 
> From mboxrd@z Thu Jan  1 00:00:00 1970
> From: Nadav Amit <namit@...are.com>
> Subject: [PATCH v5 18/23] bpf: Use vmalloc special flag
> Date: Thu, 25 Apr 2019 17:11:38 -0700
> Message-ID: <20190426001143.4983-19-namit@...are.com>
> In-Reply-To: <20190426001143.4983-1-namit@...are.com>
> References: <20190426001143.4983-1-namit@...are.com>
> MIME-Version: 1.0
> Content-Type: text/plain
> To: Peter Zijlstra <peterz@...radead.org>, Borislav Petkov <bp@...en8.de>, Andy Lutomirski <luto@...nel.org>, Ingo Molnar <mingo@...hat.com>
> Cc: linux-kernel@...r.kernel.org, x86@...nel.org, hpa@...or.com, Thomas Gleixner <tglx@...utronix.de>, Nadav Amit <nadav.amit@...il.com>, Dave Hansen <dave.hansen@...ux.intel.com>, linux_dti@...oud.com, linux-integrity@...r.kernel.org, linux-security-module@...r.kernel.org, akpm@...ux-foundation.org, kernel-hardening@...ts.openwall.com, linux-mm@...ck.org, will.deacon@....com, ard.biesheuvel@...aro.org, kristen@...ux.intel.com, deneen.t.dock@...el.com, Rick Edgecombe <rick.p.edgecombe@...el.com>, Daniel Borkmann <daniel@...earbox.net>, Alexei Starovoitov <ast@...nel.org>
> List-ID: <kernel-hardening.lists.openwall.com>
> 
> From: Rick Edgecombe <rick.p.edgecombe@...el.com>
> 
> Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
> permissioned memory in vmalloc and remove places where memory was set RW
> before freeing which is no longer needed. Don't track if the memory is RO
> anymore because it is now tracked in vmalloc.
> 
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: Alexei Starovoitov <ast@...nel.org>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> ---
>  include/linux/filter.h | 17 +++--------------
>  kernel/bpf/core.c      |  1 -
>  2 files changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 14ec3bdad9a9..7d3abde3f183 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -20,6 +20,7 @@
>  #include <linux/set_memory.h>
>  #include <linux/kallsyms.h>
>  #include <linux/if_vlan.h>
> +#include <linux/vmalloc.h>
>  
>  #include <net/sch_generic.h>
>  
> @@ -503,7 +504,6 @@ struct bpf_prog {
>  	u16			pages;		/* Number of allocated pages */
>  	u16			jited:1,	/* Is our filter JIT'ed? */
>  				jit_requested:1,/* archs need to JIT the prog */
> -				undo_set_mem:1,	/* Passed set_memory_ro() checkpoint */
>  				gpl_compatible:1, /* Is filter GPL compatible? */
>  				cb_access:1,	/* Is control block accessed? */
>  				dst_needed:1,	/* Do we need dst entry? */
> @@ -733,27 +733,17 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
>  
>  static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
>  {
> -	fp->undo_set_mem = 1;
> +	set_vm_flush_reset_perms(fp);
>  	set_memory_ro((unsigned long)fp, fp->pages);
>  }
>  
> -static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
> -{
> -	if (fp->undo_set_mem)
> -		set_memory_rw((unsigned long)fp, fp->pages);
> -}
> -
>  static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
>  {
> +	set_vm_flush_reset_perms(hdr);
>  	set_memory_ro((unsigned long)hdr, hdr->pages);
>  	set_memory_x((unsigned long)hdr, hdr->pages);
>  }
>  
> -static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
> -{
> -	set_memory_rw((unsigned long)hdr, hdr->pages);
> -}
> -
>  static inline struct bpf_binary_header *
>  bpf_jit_binary_hdr(const struct bpf_prog *fp)
>  {
> @@ -789,7 +779,6 @@ void __bpf_prog_free(struct bpf_prog *fp);
>  
>  static inline void bpf_prog_unlock_free(struct bpf_prog *fp)
>  {
> -	bpf_prog_unlock_ro(fp);
>  	__bpf_prog_free(fp);
>  }
>  
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index ff09d32a8a1b..c605397c79f0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -848,7 +848,6 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
>  	if (fp->jited) {
>  		struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
>  
> -		bpf_jit_binary_unlock_ro(hdr);
>  		bpf_jit_binary_free(hdr);
>  
>  		WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));

There are issues with the TLB management on sparc64 (primarily sun4u) that were introduced
by this patch. A typical backtrace after a crash looks like this:

[  122.085803] Unable to handle kernel NULL pointer dereference
[  122.160227] tsk->{mm,active_mm}->context = 000000000000009d
[  122.233502] tsk->{mm,active_mm}->pgd = fff0000231d14000
[  122.302118]               \|/ ____ \|/
[  122.302118]               "@'/ .. \`@"
[  122.302118]               /_| \__/ |_\
[  122.302118]                  \__U_/
[  122.495420] systemd(1): Oops [#1]
[  122.538874] CPU: 0 PID: 1 Comm: systemd Not tainted 5.2.0-3-sparc64 #1 Debian 5.2.17-1
[  122.642957] TSTATE: 0000004411001601 TPC: 000000000061cd94 TNPC: 000000000061cd98 Y: 00000000    Not tainted
[  122.772207] TPC: <vfs_getattr_nosec+0x34/0xc0>
[  122.830529] g0: 0000000000000000 g1: 00000000000007ff g2: 0000000000000000 g3: 00000000000007df
[  122.944902] g4: fff00002381771c0 g5: 0000000000000003 g6: fff0000238178000 g7: 0000000000000000
[  123.059275] o0: fff000023817be18 o1: 0000000000000000 o2: 0000000000000000 o3: fff000023817be18
[  123.173658] o4: 0000000000000000 o5: 0000000000000000 sp: fff000023817b341 ret_pc: 000000000061cd7c
[  123.292611] RPC: <vfs_getattr_nosec+0x1c/0xc0>
[  123.350933] l0: 0000010000204010 l1: fff0000101600e28 l2: e4e45b5b8ae44628 l3: 0000000000000000
[  123.465311] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: fff0000100bff140
[  123.579692] i0: fff000023817bd50 i1: fff000023817be18 i2: 0000000000000001 i3: 0000000000000900
[  123.694060] i4: 0000000000000000 i5: fff00002320c1210 i6: fff000023817b3f1 i7: 000000000061ce48
[  123.808439] I7: <vfs_getattr+0x28/0x40>
[  123.858759] Call Trace:
[  123.890785]  [000000000061ce48] vfs_getattr+0x28/0x40
[  123.957123]  [000000000061cf64] vfs_statx+0x84/0xc0
[  124.021173]  [000000000061d918] sys_statx+0x38/0x60
[  124.085226]  [0000000000406154] linux_sparc_syscall+0x34/0x44
[  124.160708] Disabling lock debugging due to kernel taint
[  124.230481] Caller[000000000061ce48]: vfs_getattr+0x28/0x40
[  124.303680] Caller[000000000061cf64]: vfs_statx+0x84/0xc0
[  124.374593] Caller[000000000061d918]: sys_statx+0x38/0x60
[  124.445503] Caller[0000000000406154]: linux_sparc_syscall+0x34/0x44
[  124.527857] Caller[fff00001013fde40]: 0xfff00001013fde40
[  124.597621] Instruction DUMP:
[  124.597623]  c2264000 
[  124.636505]  861027df 
[  124.667386]  c45f6028 
[  124.698267] <c458a050>
[  124.729148]  8408a401 
[  124.760031]  83789403 
[  124.790910]  c2264000 
[  124.821801]  c207600c 
[  124.852675]  80886800 
[  124.883556] 
[  124.954015] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[  125.054721] Press Stop-A (L1-A) from sun keyboard or send break
[  125.054721] twice on console to return to the boot prom
[  125.201103] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

I suspect that the main issue is to be found in the following patch which introduced VM_FLUSH_RESET_PERMS
which may not work as expected on sun4u SPARC systems:

commit 868b104d7379e28013e9d48bdd2db25e0bdcf751
Author: Rick Edgecombe <rick.p.edgecombe@...el.com>
Date:   Thu Apr 25 17:11:36 2019 -0700

    mm/vmalloc: Add flag for freeing of special permsissions
   
    Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to
    immediately clear executable TLB entries before freeing pages, and handle
    resetting permissions on the directmap. This flag is useful for any kind
    of memory with elevated permissions, or where there can be related
    permissions changes on the directmap. Today this is RO+X and RO memory.
   
    Although this enables directly vfreeing non-writeable memory now,
    non-writable memory cannot be freed in an interrupt because the allocation
    itself is used as a node on deferred free list. So when RO memory needs to
    be freed in an interrupt the code doing the vfree needs to have its own
    work queue, as was the case before the deferred vfree list was added to
    vmalloc.
   
    For architectures with set_direct_map_ implementations this whole operation
    can be done with one TLB flush when centralized like this. For others with
    directmap permissions, currently only arm64, a backup method using
    set_memory functions is used to reset the directmap. When arm64 adds
    set_direct_map_ functions, this backup can be removed.
   
    When the TLB is flushed to both remove TLB entries for the vmalloc range
    mapping and the direct map permissions, the lazy purge operation could be
    done to try to save a TLB flush later. However today vm_unmap_aliases
    could flush a TLB range that does not include the directmap. So a helper
    is added with extra parameters that can allow both the vmalloc address and
    the direct mapping to be flushed during this operation. The behavior of the
    normal vm_unmap_aliases function is unchanged.
   
    Suggested-by: Dave Hansen <dave.hansen@...el.com>
    Suggested-by: Andy Lutomirski <luto@...nel.org>
    Suggested-by: Will Deacon <will.deacon@....com>
    Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
    Cc: <akpm@...ux-foundation.org>
    Cc: <ard.biesheuvel@...aro.org>
    Cc: <deneen.t.dock@...el.com>
    Cc: <kernel-hardening@...ts.openwall.com>
    Cc: <kristen@...ux.intel.com>
    Cc: <linux_dti@...oud.com>
    Cc: Borislav Petkov <bp@...en8.de>
    Cc: H. Peter Anvin <hpa@...or.com>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Cc: Nadav Amit <nadav.amit@...il.com>
    Cc: Rik van Riel <riel@...riel.com>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Link: https://lkml.kernel.org/r/20190426001143.4983-17-namit@vmware.com
    Signed-off-by: Ingo Molnar <mingo@...nel.org>

The crash will always happen when support for transparent huge pages is enabled (CONFIG_TRANSPARENT_HUGEPAGE=y
and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y) and in particular on sun4u machines (but not so much the more modern
sun4v machines although I cannot rule out that the crashes sometimes happening on these machines is related
to this bug).

With THP enabled, the crash can be delayed by either reverting d563d678aa0b or, for example, by this crude hack:

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6dbcdceecae1..128118593b48 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2948,8 +2948,8 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
        }
        free_purged_blocks(&purge_list);
 
-       if (!__purge_vmap_area_lazy(start, end, false) && flush)
-               flush_tlb_kernel_range(start, end);
+       //      if (!__purge_vmap_area_lazy(start, end, false) && flush)
+       //      flush_tlb_kernel_range(start, end);
        mutex_unlock(&vmap_purge_lock);
 }

Please see also the discussion in [1].

Thanks,
Adrian

> [1] https://lore.kernel.org/all/35f5ec4eda8a7dbeeb7df9ec0be5c0b062c509f7.camel@physik.fu-berlin.de/

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ