lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHZfRhsjmHVMUREOTVafRodMH85hzLdFFunneK=4ir0-w@mail.gmail.com>
Date: Fri, 16 May 2025 10:03:52 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: David Wang <00107082@....com>
Cc: kent.overstreet@...ux.dev, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: BUG: unable to handle page fault for address

Hi David,

On Fri, May 16, 2025 at 6:13 AM David Wang <00107082@....com> wrote:
>
> Hi,
>
> I caught a page fault when I was changing my nvidia driver:
> (This happens randomly, I can reproduce it with about 1/3 probability)
>
> [Fri May 16 12:05:41 2025] BUG: unable to handle page fault for address: ffff9d28984c3000
> [Fri May 16 12:05:41 2025] #PF: supervisor read access in kernel mode
> [Fri May 16 12:05:41 2025] #PF: error_code(0x0000) - not-present page
> ...
> [Fri May 16 12:05:41 2025] RIP: 0010:release_module_tags+0x103/0x1b0
> ...
> [Fri May 16 12:05:41 2025] Call Trace:
> [Fri May 16 12:05:41 2025]  <TASK>
> [Fri May 16 12:05:41 2025]  codetag_unload_module+0x135/0x160
> [Fri May 16 12:05:41 2025]  free_module+0x19/0x1a0
> ...
> (full kernel logs are pasted at the end.)
>
> Using a image with DEBUG_INFO, the calltrack parses as:
>
> RIP: 0010:release_module_tags (./include/linux/alloc_tag.h:134 lib/alloc_tag.c:352 lib/alloc_tag.c:573)
> [Fri May 16 12:05:41 2025] codetag_unload_module (lib/codetag.c:355)
> [Fri May 16 12:05:41 2025] free_module (kernel/module/main.c:1305)
> [Fri May 16 12:05:41 2025] __do_sys_delete_module (kernel/module/main.c:795)
>
> The offending lines in my codebase:
>         126 static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
>         127 {
>         ...
>         131
>         132         for_each_possible_cpu(cpu) {
>         133                 counter = per_cpu_ptr(tag->counters, cpu);
> >>>>    134                 v.bytes += counter->bytes;   <--------------here
>         135                 v.calls += counter->calls;
>
>
> Nvidia drivers are out-tree... there could be some strange behavior in it causes this.. but,
> when I check the code, I got concerned about lifecycle of tag->counters.
> Based on following defination:
>         108 #define DEFINE_ALLOC_TAG(_alloc_tag)                                            \
>         109         static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);      \
>         110         static struct alloc_tag _alloc_tag __used __aligned(8)                  \
>         111         __section(ALLOC_TAG_SECTION_NAME) = {                                   \
>         112                 .ct = CODE_TAG_INIT,                                            \
>         113                 .counters = &_alloc_tag_cntr };
>         114
> _alloc_tag_cntr is the data referenced by tag->counters, but they are in different section,
> and alloc_tag only prepare storage for section ALLOC_TAG_SECTION_NAME.
> right?
> Then what happens to those ".data..percpu" section when module is unloaded?
> Is it safe to keep using those ".data..percpu" section after module unloaded,
> or even during module is unloading?

Yes, I think you are right, free_module() calls percpu_modfree() which
would free the per-cpu memory allocated for the module. Before
0db6f8d7820a ("alloc_tag: load module tags into separate contiguous
memory") we would not unload the module if there were tags which were
still in use. After that change we load module tags into separate
memory, so I expected this to work but due to this external reference
it indeed should lead to UAF.
I think the simplest way to fix this would be to bypass
percpu_modfree() inside free_module() when there are module tags still
referenced, store mod->percpu inside alloc_tag_module_section and free
it inside clean_unused_module_areas_locked() once we know the counters
are not used anymore. I'll take a stab at it and will send a patch for
testing today.
Thanks,
Suren.

>
> I was expecting page fault when I make following experiments:
>
> 1.Load module A
> 2.Load module B
> 3.module B alloc memory, and handover the memory to A
> 4.unload module B
> (memory profiling report module B has memory not freed)
> ... after a while....
> 5.unload module A, where A free the memory. (when A kfree the memory,
> the counters used to be in module B's ".data..percpu" section should
> be referenced, it that section is gone, a pagefault should happen).
>
> But, after several trials, not page fault reported....
> Would kernel keeps ".data..percpu" since ALLOC_TAG_SECTION_NAME has reference to it,
> or I just need wait longer for kernel to "purge" those sections.
>
>
> Full logs:
> [Fri May 16 12:02:28 2025] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.144.03  Mon Dec 30 17:10:10 UTC 2024
> [Fri May 16 12:02:28 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Loading driver
> [Fri May 16 12:02:28 2025] [drm] Initialized nvidia-drm 0.0.0 for 0000:26:00.0 on minor 0
> [Fri May 16 12:03:45 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Unloading driver
> [Fri May 16 12:03:46 2025] nvidia-modeset: Unloading
> [Fri May 16 12:03:46 2025] nvidia-nvlink: Unregistered Nvlink Core, major device number 240
> [Fri May 16 12:04:38 2025] VFIO - User Level meta-driver version: 0.3
> [Fri May 16 12:04:38 2025] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
>
> [Fri May 16 12:04:38 2025] nvidia 0000:26:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=none
> [Fri May 16 12:04:38 2025] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  570.144  Thu Apr 10 20:33:29 UTC 2025
> [Fri May 16 12:04:38 2025] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  570.144  Thu Apr 10 20:03:03 UTC 2025
> [Fri May 16 12:04:38 2025] nvidia_drm: Unknown symbol drm_client_setup (err -2)
> [Fri May 16 12:04:40 2025] nvidia-modeset: Unloading
> [Fri May 16 12:04:40 2025] nvidia-nvlink: Unregistered Nvlink Core, major device number 239
> [Fri May 16 12:05:40 2025] VFIO - User Level meta-driver version: 0.3
> [Fri May 16 12:05:40 2025] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
>
> [Fri May 16 12:05:40 2025] nvidia 0000:26:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=none
> [Fri May 16 12:05:40 2025] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  570.144  Thu Apr 10 20:33:29 UTC 2025
> [Fri May 16 12:05:40 2025] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  570.144  Thu Apr 10 20:03:03 UTC 2025
> [Fri May 16 12:05:40 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Loading driver
> [Fri May 16 12:05:40 2025] [drm] Initialized nvidia-drm 0.0.0 for 0000:26:00.0 on minor 0
> [Fri May 16 12:05:40 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Unloading driver
> [Fri May 16 12:05:40 2025] nvidia-modeset: Unloading
> [Fri May 16 12:05:41 2025] nvidia-nvlink: Unregistered Nvlink Core, major device number 239
> [Fri May 16 12:05:41 2025] BUG: unable to handle page fault for address: ffff9d28984c3000
> [Fri May 16 12:05:41 2025] #PF: supervisor read access in kernel mode
> [Fri May 16 12:05:41 2025] #PF: error_code(0x0000) - not-present page
> [Fri May 16 12:05:41 2025] PGD 163001067 P4D 163001067 PUD 0
> [Fri May 16 12:05:41 2025] Oops: Oops: 0000 [#1] SMP NOPTI
> [Fri May 16 12:05:41 2025] CPU: 0 UID: 0 PID: 35898 Comm: modprobe Tainted: G           OE       6.15.0-rc6-linan-0 #587 PREEMPT(voluntary)
> [Fri May 16 12:05:41 2025] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [Fri May 16 12:05:41 2025] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.L0 07/18/2024
> [Fri May 16 12:05:41 2025] RIP: 0010:release_module_tags+0x103/0x1b0
> [Fri May 16 12:05:41 2025] Code: e3 4c 03 64 24 08 4c 39 e3 72 bf 8b 0d 86 02 f6 00 31 ed 31 c0 eb 17 48 63 f0 49 8b 54 24 20 83 c0 01 48 8b 34 f5 40 e6 d4 95 <48> 03 2c 16 89 ce 48 63 d0 48 c7 c7 c0 fd d4 95 e8 78 3f f5 ff 8b
> [Fri May 16 12:05:41 2025] RSP: 0018:ffffb2eb40bdfe00 EFLAGS: 00010202
> [Fri May 16 12:05:41 2025] RAX: 0000000000000001 RBX: ffffffffc04609d7 RCX: 0000000000000008
> [Fri May 16 12:05:41 2025] RDX: 0000000000000000 RSI: ffff9d28984c3000 RDI: ffffffff95d4fdc0
> [Fri May 16 12:05:41 2025] RBP: 0000000000000000 R08: ffff9d25035aed08 R09: 000000000000fcf8
> [Fri May 16 12:05:41 2025] R10: ffffb2eb40bdfe00 R11: 0000000000000001 R12: ffffffffc04609b0
> [Fri May 16 12:05:41 2025] R13: ffffffffc19ef000 R14: ffff9d2502ab6870 R15: 0000000000000000
> [Fri May 16 12:05:41 2025] FS:  00007fe9ee051040(0000) GS:ffff9d28984c3000(0000) knlGS:0000000000000000
> [Fri May 16 12:05:41 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Fri May 16 12:05:41 2025] CR2: ffff9d28984c3000 CR3: 000000011278d000 CR4: 0000000000350ef0
> [Fri May 16 12:05:41 2025] Call Trace:
> [Fri May 16 12:05:41 2025]  <TASK>
> [Fri May 16 12:05:41 2025]  codetag_unload_module+0x135/0x160
> [Fri May 16 12:05:41 2025]  free_module+0x19/0x1a0
> [Fri May 16 12:05:41 2025]  __do_sys_delete_module+0x274/0x310
> [Fri May 16 12:05:41 2025]  ? srso_return_thunk+0x5/0x5f
> [Fri May 16 12:05:41 2025]  ? fpregs_assert_state_consistent+0x21/0x50
> [Fri May 16 12:05:41 2025]  ? srso_return_thunk+0x5/0x5f
> [Fri May 16 12:05:41 2025]  do_syscall_64+0x4b/0x120
> [Fri May 16 12:05:41 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [Fri May 16 12:05:41 2025] RIP: 0033:0x7fe9ed928a67
> [Fri May 16 12:05:41 2025] Code: 73 01 c3 48 8b 0d 99 83 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 83 0c 00 f7 d8 64 89 01 48
> [Fri May 16 12:05:41 2025] RSP: 002b:00007ffe17759568 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [Fri May 16 12:05:41 2025] RAX: ffffffffffffffda RBX: 0000564f850b7e70 RCX: 00007fe9ed928a67
> [Fri May 16 12:05:41 2025] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564f850b7ed8
> [Fri May 16 12:05:41 2025] RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
> [Fri May 16 12:05:41 2025] R10: 00007fe9ed99bac0 R11: 0000000000000206 R12: 0000000000000000
> [Fri May 16 12:05:41 2025] R13: 0000000000000000 R14: 00007ffe177595a0 R15: 00007ffe1775aa60
> [Fri May 16 12:05:41 2025]  </TASK>
> [Fri May 16 12:05:41 2025] Modules linked in: vfio_pci_core(E-) vfio(E) drm_client_lib(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) binfmt_misc(E) edac_mce_amd(E) kvm_amd(E) snd_hda_codec_realtek(E) nls_ascii(E) iwlmvm(E) nls_cp437(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) kvm(E) snd_hda_scodec_component(E) mac80211(E) irqbypass(E) uvcvideo(E) vfat(E) snd_hda_intel(E) libarc4(E) ghash_clmulni_intel(E) fat(E) videobuf2_vmalloc(E) snd_intel_dspcfg(E) sha256_ssse3(E) uvc(E) snd_usb_audio(E) sha1_ssse3(E) snd_hda_codec(E) videobuf2_memops(E) iwlwifi(E) drm_ttm_helper(E) videobuf2_v4l2(E) aesni_intel(E) snd_usbmidi_lib(E) snd_hda_core(E) ttm(E) snd_rawmidi(E) gf128mul(E) snd_pcsp(E) snd_seq_device(E) ppdev(E) snd_hwdep(E) crypto_simd(E) drm_kms_helper(E) videodev(E) cryptd(E) cfg80211(E) snd_pcm(E) videobuf2_common(E) rapl(E) snd_timer(E) sp5100_tco(E) drm(E) mc(E) wmi_bmof(E) k10temp(E) tpm_crb(E) rfkill(E) snd(E) sha3_generic(E) video(E) soundcore(E) jitterentropy_rng(E) sha512_ssse3(E) tpm_tis(E) ccp(E)
> [Fri May 16 12:05:41 2025]  sha512_generic(E) tpm_tis_core(E) joydev(E) drbg(E) tpm(E) ansi_cprng(E) libaescfb(E) ecdh_generic(E) ecc(E) parport_pc(E) parport(E) sg(E) rng_core(E) evdev(E) msr(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) efivarfs(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) raid1(E) raid0(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) ahci(E) xhci_pci(E) libahci(E) nvme(E) xhci_hcd(E) libata(E) i2c_piix4(E) i2c_smbus(E) r8169(E) nvme_core(E) realtek(E) usbcore(E) scsi_mod(E) scsi_common(E) usb_common(E) wmi(E) gpio_amdpt(E) gpio_generic(E) button(E) [last unloaded: vfio_pci_core(E)]
> [Fri May 16 12:05:41 2025] CR2: ffff9d28984c3000
> [Fri May 16 12:05:41 2025] ---[ end trace 0000000000000000 ]---
> [Fri May 16 12:05:41 2025] RIP: 0010:release_module_tags+0x103/0x1b0
> [Fri May 16 12:05:41 2025] Code: e3 4c 03 64 24 08 4c 39 e3 72 bf 8b 0d 86 02 f6 00 31 ed 31 c0 eb 17 48 63 f0 49 8b 54 24 20 83 c0 01 48 8b 34 f5 40 e6 d4 95 <48> 03 2c 16 89 ce 48 63 d0 48 c7 c7 c0 fd d4 95 e8 78 3f f5 ff 8b
> [Fri May 16 12:05:41 2025] RSP: 0018:ffffb2eb40bdfe00 EFLAGS: 00010202
> [Fri May 16 12:05:41 2025] RAX: 0000000000000001 RBX: ffffffffc04609d7 RCX: 0000000000000008
> [Fri May 16 12:05:41 2025] RDX: 0000000000000000 RSI: ffff9d28984c3000 RDI: ffffffff95d4fdc0
> [Fri May 16 12:05:41 2025] RBP: 0000000000000000 R08: ffff9d25035aed08 R09: 000000000000fcf8
> [Fri May 16 12:05:41 2025] R10: ffffb2eb40bdfe00 R11: 0000000000000001 R12: ffffffffc04609b0
> [Fri May 16 12:05:41 2025] R13: ffffffffc19ef000 R14: ffff9d2502ab6870 R15: 0000000000000000
> [Fri May 16 12:05:41 2025] FS:  00007fe9ee051040(0000) GS:ffff9d28984c3000(0000) knlGS:0000000000000000
> [Fri May 16 12:05:41 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Fri May 16 12:05:41 2025] CR2: ffff9d28984c3000 CR3: 000000011278d000 CR4: 0000000000350ef0
> [Fri May 16 12:05:41 2025] note: modprobe[35898] exited with irqs disabled
> [Fri May 16 12:05:41 2025] note: modprobe[35898] exited with preempt_count 1
>
>
>
>
> Thanks
> David
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ