lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4d800744-7b88-41aa-9979-b245e8bf794b@intel.com>
Date: Tue, 11 Mar 2025 12:28:10 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Gwan-gyeong Mun <gwan-gyeong.mun@...el.com>, linux-kernel@...r.kernel.org
Cc: osalvador@...e.de, 42.hyeyoo@...il.com, byungchul@...com,
 dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org,
 akpm@...ux-foundation.org, stable@...r.kernel.org, linux-mm@...ck.org,
 max.byungchul.park@...com, max.byungchul.park@...il.com
Subject: Re: [PATCH] mm: introduce include/linux/pgalloc.h

Leading off the comments... The subject is not _wrong_, but it only
describes a small piece of what is going on here. If you can't describe
a patch in the subject, it's also generally a good sign that the patch
needs breaking up.

On 3/11/25 04:44, Gwan-gyeong Mun wrote:
> The include/linux/pgalloc.h is going to be the home of generic page table
> population functions. Start with including asm/pgalloc.h to
> include/linux/pgalloc.h and adding functions that if an architecture needs
> to synchronize global pgds when populating the init_mm's pgd.
> 
> The newly introduced p4d_populate_kernel() and pgd_populate_kernel()
> functions synchronize global pgds after populating init_mm's p4d/pgd.
> 
> If the architecture requires to synchronize global pgds,
> add ARCH_USE_SYNC_GLOBAL_PGDS to the Kconfig of the required architecture
> and implement arch_sync_global_p4ds() and arch_sync_global_pgds() to the
> architecture.
> And this patch adds an implementation to the x86 architecture.

BTW, please don't use "this patch". Use imperative voice instead.

> Through using new introduced p4d_populate_kernel() and
> pgd_populate_kernel() functions, it addresses an Oops issues when
> performing test of loading XE GPU driver module after applying the GPU SVM
> and Xe SVM patch series[1] and the Dept patch series[2].
> 
> The issue occurs when loading the xe driver via modprobe [3], which adds
> a struct page for device memory via devm_memremap_pages().
> When a process leads the addition of a struct page to vmemmap
> (e.g. hot-plug), the page table update for the newly added vmemmap-based
> virtual address is updated first in init_mm's page table and then
> synchronized later.
> If the vmemmap-based virtual address is accessed through the process's
> page table before this sync, a page fault will occur.
> This addresses the issue by synchronizing with the global pgds immediately
> after populating the init_mm's pgd.
> 
> [1] https://lore.kernel.org/dri-devel/20250213021112.1228481-1-matthew.brost@intel.com/
> [2] https://lore.kernel.org/lkml/20240508094726.35754-1-byungchul@sk.com/
> [3]
> [   49.103630] xe 0000:00:04.0: [drm] Available VRAM: 0x0000000800000000, 0x00000002fb800000
> [   49.116710] BUG: unable to handle page fault for address: ffffeb3ff1200000
> [   49.117175] #PF: supervisor write access in kernel mode
> [   49.117511] #PF: error_code(0x0002) - not-present page
> [   49.117835] PGD 0 P4D 0
> [   49.118015] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
> [   49.118366] CPU: 3 UID: 0 PID: 302 Comm: modprobe Tainted: G        W          6.13.0-drm-tip-test+ #62
> [   49.118976] Tainted: [W]=WARN
> [   49.119179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> [   49.119710] RIP: 0010:vmemmap_set_pmd+0xff/0x230
> [   49.120011] Code: 77 22 02 a9 ff ff 1f 00 74 58 48 8b 3d 62 77 22 02 48 85 ff 0f 85 9a 00 00 00 48 8d 7d 08 48 89 e9 31 c0 48 89 ea 48 83 e7 f8 <48> c7 45 00 00 00 00 00 48 29 f9 48 c7 45 48 00 00 00 00 83 c1 50
> [   49.121158] RSP: 0018:ffffc900016d37a8 EFLAGS: 00010282
> [   49.121502] RAX: 0000000000000000 RBX: ffff888164000000 RCX: ffffeb3ff1200000
> [   49.121966] RDX: ffffeb3ff1200000 RSI: 80000000000001e3 RDI: ffffeb3ff1200008
> [   49.122499] RBP: ffffeb3ff1200000 R08: ffffeb3ff1280000 R09: 0000000000000000
> [   49.123032] R10: ffff88817b94dc48 R11: 0000000000000003 R12: ffffeb3ff1280000
> [   49.123566] R13: 0000000000000000 R14: ffff88817b94dc48 R15: 8000000163e001e3
> [   49.124096] FS:  00007f53ae71d740(0000) GS:ffff88843fd80000(0000) knlGS:0000000000000000
> [   49.124698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   49.125129] CR2: ffffeb3ff1200000 CR3: 000000017c7d2000 CR4: 0000000000750ef0
> [   49.125662] PKRU: 55555554
> [   49.125880] Call Trace:
> [   49.126078]  <TASK>
> [   49.126252]  ? __die_body.cold+0x19/0x26
> [   49.126509]  ? page_fault_oops+0xa2/0x240
> [   49.126736]  ? preempt_count_add+0x47/0xa0
> [   49.126968]  ? search_module_extables+0x4a/0x80
> [   49.127224]  ? exc_page_fault+0x206/0x230
...

BTW, oopses are usually better in the cover letter or at least left
trimmed down to the bare essentials if you want to keep them in a commit
message.

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index d581634c6a59..0e0606cc9d4f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -427,6 +427,10 @@ config PGTABLE_LEVELS
>  	default 3 if X86_PAE
>  	default 2
>  
> +config ARCH_USE_SYNC_GLOBAL_PGDS
> +	def_bool y
> +	depends on X86_64 && (PGTABLE_LEVELS > 3)

Do we have a 3-level 64-bit paging mode I'm not aware of? ;)

I know it's not super obvious, but take a look at the relevant Kconfig
options:

config PGTABLE_LEVELS
        int
        default 5 if X86_5LEVEL
        default 4 if X86_64
        default 3 if X86_PAE
        default 2

config X86_PAE
        depends on X86_32 && !HIGHMEM4G

config X86_5LEVEL
        depends on X86_64

Expanding that back into the PGTABLE_LEVELS options:

        default 5 if X86_5LEVEL # depends on X86_64
        default 4 if X86_64
        default 3 if X86_PAE # depends on X86_32
        default 2 # can't be set if X86_64

So, 4 and 5 are 64-bit only and 2 and 3 are 32-bit only.

> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
...
> +#ifdef CONFIG_ARCH_USE_SYNC_GLOBAL_PGDS

What purpose does this #ifdef serve?

> +#if CONFIG_PGTABLE_LEVELS > 3
> +void arch_sync_global_p4ds(unsigned long start, unsigned long end)
> +{
> +	sync_global_pgds(start, end);
> +}
> +
> +void arch_sync_global_pgds(unsigned long start, unsigned long end) {}
> +
> +#if CONFIG_PGTABLE_LEVELS > 4
> +void arch_sync_global_p4ds(unsigned long start, unsigned long end) {}
> +
> +void arch_sync_global_pgds(unsigned long start, unsigned long end)
> +{
> +	sync_global_pgds(start, end);
> +}

Why does this need both a pgd and a p4d variant? I don't think it would
be the end of the world to have both {p4d,pgd}_populate_kernel() call
(just picking a random name) arch_sync_kernel_pagetables().

> new file mode 100644
> index 000000000000..072cca766245
> --- /dev/null
> +++ b/include/linux/pgalloc.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_PGALLOC_H
> +#define _LINUX_PGALLOC_H
> +#include <asm/pgalloc.h>
> +
> +#ifdef CONFIG_ARCH_USE_SYNC_GLOBAL_PGDS
> +void arch_sync_global_p4ds(unsigned long start, unsigned long end);
> +void arch_sync_global_pgds(unsigned long start, unsigned long end);
> +#else
> +static inline void arch_sync_global_p4ds(unsigned long start, unsigned long end) {}
> +static inline void arch_sync_global_pgds(unsigned long start, unsigned long end) {}
> +#endif
> +
> +static inline void p4d_populate_kernel(unsigned long addr, p4d_t *p4d, pud_t *pud)
> +{
> +	p4d_populate(&init_mm, p4d, pud);
> +	arch_sync_global_p4ds(addr, addr);
> +}
> +
> +static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd, p4d_t *p4d)
> +{
> +	pgd_populate(&init_mm, pgd, p4d);
> +	arch_sync_global_pgds(addr, addr);
> +}

So, there's x86 code that calls sync_global_pgds() today. Does that code
just _stay_ there? Isn't it redundant now?

Why does arch_sync_global_pgds() have a start and end if they're always
the same value?

I think this wants to be at *LEAST* four patches:

1. Introduce and use {pgd,p4d}_populate_kernel()
2. Introduce arch_sync_global_*() (with a better name) to those
   functions along with dummy implementations
3. Have x86 define the arch_sync...() function(s)
4. Fix up the x86 code to remove superfluous sync_global_pgds() calls

Also, I see quite a few p*d_populate(&init_mm, ...) calls. Don't those
need to get converted over?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ