linux-kernel - Re: [BUG] Kernel panic in __migrate_swap_task() on 6.16-rc2 (NULL pointer dereference)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31bd3395-cfe3-4af5-bc1c-fa8d26629b93@intel.com>
Date: Fri, 27 Jun 2025 15:16:07 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Jirka Hladky <jhladky@...hat.com>, Abhigyan ghosh
	<zscript.team.zs@...il.com>
CC: <linux-kernel@...r.kernel.org>, Chen Yu <yu.chen.surf@...mail.com>
Subject: Re: [BUG] Kernel panic in __migrate_swap_task() on 6.16-rc2 (NULL
 pointer dereference)

Hi Jirka,

On 6/27/2025 5:46 AM, Jirka Hladky wrote:
> Hi Chen and all,
> 
> we have now verified that the following commit causes a kernel panic
> discussed in this thread:
> 
> ad6b26b6a0a79 sched/numa: add statistics of numa balance task
> 
> Reverting this commit fixes the issue.
> 
> I'm happy to help debug this further or test a proposed fix.
> 

Thanks very much for your report, it seems that there is a
race condition that when the swap task candidate was chosen,
but its mm_struct get released due to task exit, then later
when doing the task swaping, the p->mm is NULL which caused
the problem:

CPU0                                   CPU1
:
...
task_numa_migrate
   task_numa_find_cpu
    task_numa_compare
      # a normal task p is chosen
      env->best_task = p

                                        # p exit:
                                        exit_signals(p);
                                           p->flags |= PF_EXITING
                                        exit_mm
                                           p->mm = NULL;

    migrate_swap_stop
      __migrate_swap_task((arg->src_task, arg->dst_cpu)
       count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL

Could you please help check if the following debug patch works,
and if there is no issue found after you ran several tests,
could you please provide the
/sys/kernel/debug/tracing/trace

BTW, is it possible to share your test script for stress-ng,
stream? It looks like the stress-ng's fork test case would
trigger this issue easier in theory.

thanks,
Chenyu


> Thank you!
> Jirka
> 
> On Wed, Jun 18, 2025 at 1:34 PM Jirka Hladky <jhladky@...hat.com> wrote:
>>
>> Hi Abhigyan,
>>
>> The testing is done on bare metal. The kernel panics occur after
>> several hours of benchmarking.
>>
>> Out of 20 servers, the problem has occurred on 6 of them:
>> intel-sapphire-rapids-gold-6448y-2s
>> intel-emerald-rapids-platinum-8558-2s
>> amd-epyc5-turin-9655p-1s
>> amd-epyc4-zen4c-bergamo-9754-1s
>> amd-epyc3-milan-7713-2s
>> intel-skylake-2s
>>
>> The number in the name is the CPU model. 1s: single socket, 2s: dual socket.
>>
>> We were not able to find a clear pattern. It appears to be a race
>> condition of some kind.
>>
>> We run various performance benchmarks, including Linpack, Stream, NAS
>> (https://www.nas.nasa.gov/software/npb.html), and Stress-ng. Testing
>> is conducted with various thread counts and settings. All benchmarks
>> together are running ~24 hours. One benchmark takes ~4 hours. Please
>> also note that we repeat the benchmarks to collect performance
>> statistics. In many cases, kernel panic has occurred when the
>> benchmark was repeated.
>>
>> Crash occurred while running these tests:
>> Stress_ng: Starting test 'fork' (#29 out of 41), number of threads 32,
>> iteration 1 out of 5
>> SPECjbb2005: Starting DEFAULT run with 4 SPECJBB2005 instances, each
>> with 24 warehouses, iteration 2 out of 3
>> Stress_ng: test 'sem' (#30 out of 41), number of threads 24, iteration
>> 2 out of 5
>> Stress_ng: test 'sem' (#30 out of 41), number of threads 64, iteration
>> 4 out of 5
>> SPECjbb2005: SINGLE run with 1 SPECJBB2005 instances, each with 128
>> warehouses, iteration 2 out of 3
>> Linpack: Benchmark-utils/linpackd, iteration 3, testType affinityRun,
>> number of threads 128
>> NAS: NPB_sources/bin/is.D.x
>>
>> There is no clear benchmark triggering the kernel panic. Looping
>> Stress_ng's sem test looks, however, like it's worth trying.
>>
>> I hope this helps. Please let me know if there's anything I can help
>> with to pinpoint the problem.
>>
>> Thanks
>> Jirka
>>
>>
>> On Wed, Jun 18, 2025 at 7:19 AM Abhigyan ghosh
>> <zscript.team.zs@...il.com> wrote:
>>>
>>> Hi Jirka,
>>>
>>> Thanks for the detailed report.
>>>
>>> I'm curious about the specific setup in which this panic was triggered. Could you share more about the exact configuration or parameters you used for running `stress-ng` or Linpack? For instance:
>>>
>>> - How many threads/cores were used?
>>> - Was it running inside a VM, container, or bare-metal?
>>> - Was this under any thermal throttling or power-saving mode?
>>>
>>> I'd like to try reproducing it locally to study the failure further.
>>>
>>> Best regards,
>>> Abhigyan Ghosh
>>>
>>> On 18 June 2025 1:35:30 am IST, Jirka Hladky <jhladky@...hat.com> wrote:
>>>> Hi all,
>>>>
>>>> I’ve encountered a reproducible kernel panic on 6.16-rc1 and 6.16-rc2
>>>> involving a NULL pointer dereference in `__migrate_swap_task()` during
>>>> CPU migration. This occurred on various AMD and Intel systems while
>>>> running a CPU-intensive workload (Linpack, Stress_ng - it's not
>>>> specific to a benchmark).
>>>>
>>>> Full trace below:
>>>> ---
>>>> BUG: kernel NULL pointer dereference, address: 00000000000004c8
>>>> #PF: supervisor read access in kernel mode
>>>> #PF: error_code(0x0000) - not-present page
>>>> PGD 4078b99067 P4D 4078b99067 PUD 0
>>>> Oops: Oops: 0000 [#1] SMP NOPTI
>>>> CPU: 74 UID: 0 PID: 466 Comm: migration/74 Kdump: loaded Not tainted
>>>> 6.16.0-0.rc2.24.eln149.x86_64 #1 PREEMPT(lazy)
>>>> Hardware name: GIGABYTE R182-Z91-00/MZ92-FS0-00, BIOS M07 09/03/2021
>>>> Stopper: multi_cpu_stop+0x0/0x130 <- migrate_swap+0xa7/0x120
>>>> RIP: 0010:__migrate_swap_task+0x2f/0x170
>>>> Code: 41 55 4c 63 ee 41 54 55 53 48 89 fb 48 83 87 a0 04 00 00 01 65
>>>> 48 ff 05 e7 14 dd 02 48 8b af 50 0a 00 00 66 90 e8 61 93 07 00 <48> 8b
>>>> bd c8 04 00 00 e8 85 11 35 00 48 85 c0 74 12 ba 01 00 00 00
>>>> RSP: 0018:ffffce79cd90bdd0 EFLAGS: 00010002
>>>> RAX: 0000000000000001 RBX: ffff8e9c7290d1c0 RCX: 0000000000000000
>>>> RDX: ffff8e9c71e83680 RSI: 000000000000001b RDI: ffff8e9c7290d1c0
>>>> RBP: 0000000000000000 R08: 00056e36392913e7 R09: 00000000002ab980
>>>> R10: ffff8eac2fcb13c0 R11: ffff8e9c77997410 R12: ffff8e7c2fcf12c0
>>>> R13: 000000000000001b R14: ffff8eac71eda944 R15: ffff8eac71eda944
>>>> FS:  0000000000000000(0000) GS:ffff8eac9db4a000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 00000000000004c8 CR3: 0000003072388003 CR4: 0000000000f70ef0
>>>> PKRU: 55555554
>>>> Call Trace:
>>>> <TASK>
>>>> migrate_swap_stop+0xe8/0x190
>>>> multi_cpu_stop+0xf3/0x130
>>>> ? __pfx_multi_cpu_stop+0x10/0x10
>>>> cpu_stopper_thread+0x97/0x140
>>>> ? __pfx_smpboot_thread_fn+0x10/0x10
>>>> smpboot_thread_fn+0xf3/0x220
>>>> kthread+0xfc/0x240
>>>> ? __pfx_kthread+0x10/0x10
>>>> ? __pfx_kthread+0x10/0x10
>>>> ret_from_fork+0xf0/0x110
>>>> ? __pfx_kthread+0x10/0x10
>>>> ret_from_fork_asm+0x1a/0x30
>>>> </TASK>
>>>> ---
>>>>
>>>> **Kernel Version:**
>>>> 6.16.0-0.rc2.24.eln149.x86_64 (Fedora rawhide)
>>>> https://koji.fedoraproject.org/koji/buildinfo?buildID=2732950
>>>>
>>>> **Reproducibility:**
>>>> Happened multiple times during routine CPU-intensive operations. It
>>>> happens with various benchmarks (Stress_ng, Linpack) after several
>>>> hours of performance testing. `migration/*` kernel threads hit a NULL
>>>> dereference in `__migrate_swap_task`.
>>>>
>>>> **System Info:**
>>>> - Platform: GIGABYTE R182-Z91-00 (dual socket EPYC)
>>>> - BIOS: M07 09/03/2021
>>>> - Config: Based on Fedora’s debug kernel (`PREEMPT(lazy)`)
>>>>
>>>> **Crash Cause (tentative):**
>>>> NULL dereference at offset `0x4c8` from a task struct pointer in
>>>> `__migrate_swap_task`. Possibly an uninitialized or freed
>>>> `task_struct` field.
>>>>
>>>> Please let me know if you’d like me to test a patch or if you need
>>>> more details.
>>>>
>>>> Thanks,
>>>> Jirka
>>>>
>>>>
>>>
>>> aghosh
>>>
>>
>>
>> --
>> -Jirka
> 
> 
>