lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <aGK937QUSmKkXleq@pc636>
Date: Mon, 30 Jun 2025 18:39:59 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: Vitaly Wool <vitaly.wool@...sulko.se>
Cc: Uladzislau Rezki <urezki@...il.com>, linux-mm@...ck.org,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	Danilo Krummrich <dakr@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v8 1/4] mm/vmalloc: allow to set node and align in
 vrealloc

> 
>     On Jun 30, 2025, at 12:30 PM, Uladzislau Rezki <urezki@...il.com> wrote:
> 
>     On Sat, Jun 28, 2025 at 12:25:37PM +0200, Vitaly Wool wrote:
> 
>         Reimplement vrealloc() to be able to set node and alignment should
>         a user need to do so. Rename the function to vrealloc_node_align()
>         to better match what it actually does now and introduce macros for
>         vrealloc() and friends for backward compatibility.
> 
>         With that change we also provide the ability for the Rust part of
>         the kernel to set node and aligmnent in its allocations.
> 
>         Signed-off-by: Vitaly Wool <vitaly.wool@...sulko.se>
>         ---
>         include/linux/vmalloc.h | 12 +++++++++---
>         mm/vmalloc.c            | 20 ++++++++++++++++----
>         2 files changed, 25 insertions(+), 7 deletions(-)
> 
>         diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>         index fdc9aeb74a44..68791f7cb3ba 100644
>         --- a/include/linux/vmalloc.h
>         +++ b/include/linux/vmalloc.h
>         @@ -197,9 +197,15 @@ extern void *__vcalloc_noprof(size_t n, size_t
>         size, gfp_t flags) __alloc_size(1
>         extern void *vcalloc_noprof(size_t n, size_t size) __alloc_size(1, 2);
>         #define vcalloc(...) alloc_hooks(vcalloc_noprof(__VA_ARGS__))
> 
>         -void * __must_check vrealloc_noprof(const void *p, size_t size, gfp_t
>         flags)
>         - __realloc_size(2);
>         -#define vrealloc(...) alloc_hooks(vrealloc_noprof(__VA_ARGS__))
>         +void *__must_check vrealloc_node_align_noprof(const void *p, size_t
>         size,
>         + unsigned long align, gfp_t flags, int nid) __realloc_size(2);
>         +#define vrealloc_node_noprof(_p, _s, _f, _nid) \
>         + vrealloc_node_align_noprof(_p, _s, 1, _f, _nid)
>         +#define vrealloc_noprof(_p, _s, _f) \
>         + vrealloc_node_align_noprof(_p, _s, 1, _f, NUMA_NO_NODE)
>         +#define vrealloc_node_align(...) alloc_hooks
>         (vrealloc_node_align_noprof(__VA_ARGS__))
>         +#define vrealloc_node(...) alloc_hooks(vrealloc_node_noprof
>         (__VA_ARGS__))
>         +#define vrealloc(...) alloc_hooks(vrealloc_noprof(__VA_ARGS__))
> 
>         extern void vfree(const void *addr);
>         extern void vfree_atomic(const void *addr);
>         diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>         index 6dbcdceecae1..d633ac0ff977 100644
>         --- a/mm/vmalloc.c
>         +++ b/mm/vmalloc.c
>         @@ -4089,12 +4089,15 @@ void *vzalloc_node_noprof(unsigned long size,
>         int node)
>         EXPORT_SYMBOL(vzalloc_node_noprof);
> 
>         /**
>         - * vrealloc - reallocate virtually contiguous memory; contents remain
>         unchanged
>         + * vrealloc_node_align_noprof - reallocate virtually contiguous
>         memory; contents
>         + * remain unchanged
>          * @p: object to reallocate memory for
>          * @size: the size to reallocate
>         + * @align: requested alignment
>          * @flags: the flags for the page level allocator
>         + * @nid: node id
>          *
>         - * If @p is %NULL, vrealloc() behaves exactly like vmalloc(). If @size
>         is 0 and
>         + * If @p is %NULL, vrealloc_XXX() behaves exactly like vmalloc(). If
>         @size is 0 and
>          * @p is not a %NULL pointer, the object pointed to is freed.
>          *
>          * If __GFP_ZERO logic is requested, callers must ensure that, starting
>         with the
>         @@ -4111,7 +4114,8 @@ EXPORT_SYMBOL(vzalloc_node_noprof);
>          * Return: pointer to the allocated memory; %NULL if @size is zero or
>         in case of
>          *         failure
>          */
>         -void *vrealloc_noprof(const void *p, size_t size, gfp_t flags)
>         +void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned
>         long align,
>         +  gfp_t flags, int nid)
>         {
>         struct vm_struct *vm = NULL;
>         size_t alloced_size = 0;
>         @@ -4135,6 +4139,13 @@ void *vrealloc_noprof(const void *p, size_t
>         size, gfp_t flags)
>         if (WARN(alloced_size < old_size,
>          "vrealloc() has mismatched area vs requested sizes (%p)\n", p))
>         return NULL;
>         + if (WARN(nid != NUMA_NO_NODE && nid != page_to_nid(vmalloc_to_page
>         (p)),
>         +  "vrealloc() has mismatched nids\n"))
>         + return NULL;
>         + if (WARN((uintptr_t)p & (align - 1),
>         +  "will not reallocate with a bigger alignment (0x%lx)\n",
>         +  align))
>         + return NULL;
> 
> 
>     IMO, IS_ALIGNED() should be used instead. We have already a macro for this
>     purpose, i.e. the idea is just to check that "p" is aligned with "align"
>     request.
> 
>     Can you replace the (uintptr_t) casting to (ulong) or (unsigned long)
>     this is how we mostly cast in vmalloc code?
> 
> 
> Thanks, noted.
> 
> 
>     WARN() probably is worth to replace. Use WARN_ON_ONCE() to prevent
>     flooding.
> 
> 
> I am not sure i totally agree, because:
> a) there’s already one WARN() in that block and I’m just following the pattern
> b) I don’t think this will be a frequent error.
> 
Could we just drop such assumption(b)? Instead we just eliminate it and
thus we do not spam the kernel buffer :)

Also, there is another:

>
> + if (WARN(nid != NUMA_NO_NODE && nid != page_to_nid(vmalloc_to_page(p)),
> + "vrealloc() has mismatched nids\n"))
> + return NULL;
>
I can easily trigger this with continuous kernel splats after adding
vrealloc_alloc_test into the vmalloc test-suite:

<snip>
[   53.517781] ------------[ cut here ]------------
[   53.517787] vrealloc() has mismatched nids
[   53.517817] WARNING: CPU: 46 PID: 2213 at mm/vmalloc.c:4198 vrealloc_node_align_noprof+0x11b/0x230
[   53.517829] Modules linked in: test_vmalloc(E+) binfmt_misc(E) ppdev(E) parport_pc(E) parport(E) bochs(E) snd_pcm(E) sg(E) drm_client_lib(E) snd_timer(E) drm_shmem_helper(E) evdev(E) joydev(E) snd(E) drm_kms_helper(E) vga16fb(E) soundcore(E) serio_raw(E) button(E) pcspkr(E) vgastate(E) drm(E) dm_mod(E) fuse(E) loop(E) configfs(E) efi_pstore(E) qemu_fw_cfg(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) libata(E) i2c_piix4(E) scsi_mod(E) psmouse(E) floppy(E) e1000(E) i2c_smbus(E) scsi_common(E)
[   53.517879] CPU: 46 UID: 0 PID: 2213 Comm: vmalloc_test/10 Kdump: loaded Tainted: G        W   E       6.16.0-rc1+ #263 PREEMPT(undef)
[   53.517886] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[   53.517887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   53.517889] RIP: 0010:vrealloc_node_align_noprof+0x11b/0x230
[   53.517894] Code: 89 4c 24 08 e8 76 b0 ff ff 4c 8b 4c 24 08 48 8b 00 48 c1 e8 36 41 39 c4 0f 84 64 ff ff ff 48 c7 c7 90 c4 28 a2 e8 25 a8 d3 ff <0f> 0b 31 ed eb 95 65 8b 05 f8 cf 90 01 a9 00 ff ff 00 0f 85 dd 00
[   53.517897] RSP: 0018:ffffa6db87f27e08 EFLAGS: 00010282
[   53.517900] RAX: 0000000000000000 RBX: ffffa6db9a315000 RCX: 0000000000000000
[   53.517902] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 00000000ffffffff
[   53.517904] RBP: 000000000000a000 R08: 0000000000000000 R09: 0000000000000003
[   53.517905] R10: ffffa6db87f27ca0 R11: ffff98c5fff0a368 R12: 0000000000000002
[   53.517908] R13: ffff98c201d06a80 R14: 0000000000009000 R15: 0000000000000001
[   53.517912] FS:  0000000000000000(0000) GS:ffff98c24cf17000(0000) knlGS:0000000000000000
[   53.517914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.517916] CR2: 00007fe515c11390 CR3: 000000084bf03000 CR4: 00000000000006f0
[   53.517920] Call Trace:
[   53.517923]  <TASK>
[   53.517928]  ? __pfx_vrealloc_alloc_test+0x10/0x10 [test_vmalloc]
[   53.517937]  vrealloc_alloc_test+0x22/0x60 [test_vmalloc]
[   53.517941]  test_func+0xd5/0x1d0 [test_vmalloc]
[   53.517946]  ? __pfx_test_func+0x10/0x10 [test_vmalloc]
[   53.517949]  kthread+0x109/0x240
[   53.517955]  ? finish_task_switch.isra.0+0x85/0x2a0
[   53.517960]  ? __pfx_kthread+0x10/0x10
[   53.517963]  ? __pfx_kthread+0x10/0x10
[   53.517966]  ret_from_fork+0x87/0xf0
[   53.517971]  ? __pfx_kthread+0x10/0x10
[   53.517974]  ret_from_fork_asm+0x1a/0x30
[   53.517980]  </TASK>
[   53.517981] ---[ end trace 0000000000000000 ]---
<snip>

Please drop that WARN(). The motivation is, we should serve the memory.
Because, processes can migrate between NUMA nodes and they still have to
be able to allocate memory.

Moreover, in the current vrealloc() implementation, memory is fully reallocated
on a new NUMA node in any case and the old allocation is released after copying
the data. So it does not matter if the NUMA node has changed.

--
Uladzislau Rezki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ