lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c17a9e6-b04b-4754-8af5-521fcadba1bd@arm.com>
Date: Fri, 30 May 2025 11:33:52 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Dev Jain <dev.jain@....com>, akpm@...ux-foundation.org, david@...hat.com,
 catalin.marinas@....com, will@...nel.org, yang@...amperecomputing.com
Cc: lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
 rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, suzuki.poulose@....com, steven.price@....com,
 gshan@...hat.com, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 1/3] mm: Allow pagewalk without locks

On 30/05/2025 10:04, Dev Jain wrote:
> It is noted at [1] that KFENCE can manipulate kernel pgtable entries during
> softirqs. It does this by calling set_memory_valid() -> __change_memory_common().
> This being a non-sleepable context, we cannot take the init_mm mmap lock.
> Therefore, add PGWALK_NOLOCK to enable walk_page_range_novma() usage without
> locks.

It looks like riscv solved this problem by moving from walk_page_range_novma()
to apply_to_existing_page_range() in Commit fb1cf0878328 ("riscv: rewrite
__kernel_map_pages() to fix sleeping in invalid context"). That won't work for
us because the whole point is that we want to support changing permissions on
block mappings.

Yang:

Not directly relavent to this patch, but I do worry about the potential need to
split the range here though once Yang's series comes in - we would need to
allocate memory for pgtables atomically in softirq context. KFENCE is intended
to be enabled in production IIRC, so we can't just not allow block mapping when
KFENCE is enabled and will likely need to think of a solution for this?


> 
> [1] https://lore.kernel.org/linux-arm-kernel/89d0ad18-4772-4d8f-ae8a-7c48d26a927e@arm.com/
> 
> Signed-off-by: Dev Jain <dev.jain@....com>
> ---
>  include/linux/pagewalk.h |  2 ++
>  mm/pagewalk.c            | 12 ++++++++----
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index 9700a29f8afb..9bc8853ed3de 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -14,6 +14,8 @@ enum page_walk_lock {
>  	PGWALK_WRLOCK = 1,
>  	/* vma is expected to be already write-locked during the walk */
>  	PGWALK_WRLOCK_VERIFY = 2,
> +	/* no lock is needed */
> +	PGWALK_NOLOCK = 3,

I'd imagine you either want to explicitly forbid this option for the other
entrypoints (i.e. the non- _novma variants) or you need to be able to handle
this option being passed in to the other functions, which you currently don't
do. I'd vote for explcitly disallowing (document it and return error if passed in).

>  };
>  
>  /**
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index e478777c86e1..9657cf4664b2 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -440,6 +440,8 @@ static inline void process_vma_walk_lock(struct vm_area_struct *vma,
>  	case PGWALK_RDLOCK:
>  		/* PGWALK_RDLOCK is handled by process_mm_walk_lock */
>  		break;
> +	default:
> +		break;
>  	}
>  #endif
>  }
> @@ -640,10 +642,12 @@ int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
>  	 * specified address range from being freed. The caller should take
>  	 * other actions to prevent this race.
>  	 */
> -	if (mm == &init_mm)
> -		mmap_assert_locked(walk.mm);

Given apply_to_page_range() doesn't do any locking for kernel pgtable walking, I
can be convinced that it's also not required for our case using this framework.
But why does this framework believe it is necessary? Should the comment above
this at least be updated?

Thanks,
Ryan

> -	else
> -		mmap_assert_write_locked(walk.mm);
> +	if (ops->walk_lock != PGWALK_NOLOCK) {
> +		if (mm == &init_mm)
> +			mmap_assert_locked(walk.mm);
> +		else
> +			mmap_assert_write_locked(walk.mm);
> +	}
>  
>  	return walk_pgd_range(start, end, &walk);
>  }


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ