linux-kernel - Re: BUG: unable to handle page fault for address

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <233aab47.38f2.196df28812a.Coremail.00107082@163.com>
Date: Sun, 18 May 2025 00:51:35 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Suren Baghdasaryan" <surenb@...gle.com>
Cc: kent.overstreet@...ux.dev, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: BUG: unable to handle page fault for address


At 2025-05-18 00:39:30, "Suren Baghdasaryan" <surenb@...gle.com> wrote:
>On Sat, May 17, 2025 at 12:02 AM David Wang <00107082@....com> wrote:
>>
>>
>> At 2025-05-17 08:11:24, "Suren Baghdasaryan" <surenb@...gle.com> wrote:
>> >On Fri, May 16, 2025 at 10:03 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>> >>
>> >> Hi David,
>> >>
>> >> On Fri, May 16, 2025 at 6:13 AM David Wang <00107082@....com> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I caught a page fault when I was changing my nvidia driver:
>> >> > (This happens randomly, I can reproduce it with about 1/3 probability)
>> >> >
>> >> > [Fri May 16 12:05:41 2025] BUG: unable to handle page fault for address: ffff9d28984c3000
>> >> > [Fri May 16 12:05:41 2025] #PF: supervisor read access in kernel mode
>> >> > [Fri May 16 12:05:41 2025] #PF: error_code(0x0000) - not-present page
>> >> > ...
>> >> > [Fri May 16 12:05:41 2025] RIP: 0010:release_module_tags+0x103/0x1b0
>> >> > ...
>> >> > [Fri May 16 12:05:41 2025] Call Trace:
>> >> > [Fri May 16 12:05:41 2025]  <TASK>
>> >> > [Fri May 16 12:05:41 2025]  codetag_unload_module+0x135/0x160
>> >> > [Fri May 16 12:05:41 2025]  free_module+0x19/0x1a0
>> >> > ...
>> >> > (full kernel logs are pasted at the end.)
>> >> >
>> >> > Using a image with DEBUG_INFO, the calltrack parses as:
>> >> >
>> >> > RIP: 0010:release_module_tags (./include/linux/alloc_tag.h:134 lib/alloc_tag.c:352 lib/alloc_tag.c:573)
>> >> > [Fri May 16 12:05:41 2025] codetag_unload_module (lib/codetag.c:355)
>> >> > [Fri May 16 12:05:41 2025] free_module (kernel/module/main.c:1305)
>> >> > [Fri May 16 12:05:41 2025] __do_sys_delete_module (kernel/module/main.c:795)
>> >> >
>> >> > The offending lines in my codebase:
>> >> >         126 static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
>> >> >         127 {
>> >> >         ...
>> >> >         131
>> >> >         132         for_each_possible_cpu(cpu) {
>> >> >         133                 counter = per_cpu_ptr(tag->counters, cpu);
>> >> > >>>>    134                 v.bytes += counter->bytes;   <--------------here
>> >> >         135                 v.calls += counter->calls;
>> >> >
>> >> >
>> >> > Nvidia drivers are out-tree... there could be some strange behavior in it causes this.. but,
>> >> > when I check the code, I got concerned about lifecycle of tag->counters.
>> >> > Based on following defination:
>> >> >         108 #define DEFINE_ALLOC_TAG(_alloc_tag)                                            \
>> >> >         109         static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);      \
>> >> >         110         static struct alloc_tag _alloc_tag __used __aligned(8)                  \
>> >> >         111         __section(ALLOC_TAG_SECTION_NAME) = {                                   \
>> >> >         112                 .ct = CODE_TAG_INIT,                                            \
>> >> >         113                 .counters = &_alloc_tag_cntr };
>> >> >         114
>> >> > _alloc_tag_cntr is the data referenced by tag->counters, but they are in different section,
>> >> > and alloc_tag only prepare storage for section ALLOC_TAG_SECTION_NAME.
>> >> > right?
>> >> > Then what happens to those ".data..percpu" section when module is unloaded?
>> >> > Is it safe to keep using those ".data..percpu" section after module unloaded,
>> >> > or even during module is unloading?
>> >>
>> >> Yes, I think you are right, free_module() calls percpu_modfree() which
>> >> would free the per-cpu memory allocated for the module. Before
>> >> 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous
>> >> memory") we would not unload the module if there were tags which were
>> >> still in use. After that change we load module tags into separate
>> >> memory, so I expected this to work but due to this external reference
>> >> it indeed should lead to UAF.
>> >> I think the simplest way to fix this would be to bypass
>> >> percpu_modfree() inside free_module() when there are module tags still
>> >> referenced, store mod->percpu inside alloc_tag_module_section and free
>> >> it inside clean_unused_module_areas_locked() once we know the counters
>> >> are not used anymore. I'll take a stab at it and will send a patch for
>> >> testing today.
>> >
>> >Ok, I went with another implementation, instead dynamically allocating
>> >percpu memory for modules at the module load time. This has another
>> >advantage of not needing extra PERCPU_MODULE_RESERVE currently
>> >required for memory allocation tagging to work.
>> >David, the patch is posted at [1]. Please give it a try and let me
>> >know if the fix works for you.
>> >Thanks,
>> >Suren.
>> >
>> >[1] https://lore.kernel.org/all/20250517000739.5930-1-surenb@google.com/
>> >
>> >
>> >> Thanks,
>> >> Suren.
>> >>
>>
>> Hi, the patch does fix my issue.
>> I now have another similar concern about modules RO data,
>> The codetag defined as
>>  24 struct codetag {
>>  25         unsigned int flags; /* used in later patches */
>>  26         unsigned int lineno;
>>  27         const char *modname;
>>  28         const char *function;
>>  29         const char *filename;
>>  30 } __aligned(8);
>>
>> Those modname/function/filename would refer to RO data section, right?
>> When module unloaded, its RO data section would be released at some point.
>> My question is is it safe to use RO data during module unload? because these
>> lines seems to access those data:
>>
>> +                       pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
>> +                               tag->ct.filename, tag->ct.lineno, tag->ct.modname,
>> +                               tag->ct.function, counter.bytes);
>
>These lines are called from release_module_tags() using this call chain:
>
>delete_module()
>  free_module()
>    codetag_unload_module()
>      release_module_tags()
>
>and codetag_unload_module() is called at the very beginning of
>free_module(), when no other module memory has been freed yet. So,
>from what I understand, this should be safe.
>After we unload the module these pointers inside the tags will be
>dandling but we should not be using them anymore since we do not
>report unloaded modules. Do you see some usage that I missed?

Why data..percpu. is different. The page fault error caught when I reinstall nvidia drivers is also
raised from release_module_tags().

Is data..percpu. section released earlier? 

Thanks
David