linux-kernel - Re: [BUG] KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83b3c900-d4eb-457f-99e6-cb21e0eae4cb@seco.com>
Date: Thu, 7 Mar 2024 14:43:04 -0500
From: Sean Anderson <sean.anderson@...o.com>
To: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>,
 linux-kernel@...r.kernel.org
Cc: Frederic Weisbecker <frederic@...nel.org>,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>
Subject: Re: [BUG] KCSAN: data-race in tick_nohz_idle_stop_tick /
 tick_nohz_next_event

On 8/18/23 10:15, Mirsad Todorovac wrote:
>
>
> On 8/17/23 21:32, Mirsad Todorovac wrote:
>> Hi,
>>
>> This is your friendly bug reporter. :-)
>>
>> The environment is the vanilla torvalds kernel on Ubuntu 22.04 LTS and a Ryzen 7950X assembled box.
>>
>> [ 7145.500079] ==================================================================
>> [ 7145.500109] BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
>>
>> [ 7145.500139] write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26:
>> [ 7145.500155]  tick_nohz_idle_stop_tick+0x3b1/0x4a0
>> [ 7145.500173]  do_idle+0x1e3/0x250
>> [ 7145.500186]  cpu_startup_entry+0x20/0x30
>> [ 7145.500199]  start_secondary+0x129/0x160
>> [ 7145.500216]  secondary_startup_64_no_verify+0x17e/0x18b
>>
>> [ 7145.500245] read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16:
>> [ 7145.500261]  tick_nohz_next_event+0xe7/0x1e0
>> [ 7145.500277]  tick_nohz_get_sleep_length+0xa7/0xe0
>> [ 7145.500293]  menu_select+0x82/0xb90
>> [ 7145.500309]  cpuidle_select+0x44/0x60
>> [ 7145.500324]  do_idle+0x1c2/0x250
>> [ 7145.500336]  cpu_startup_entry+0x20/0x30
>> [ 7145.500350]  start_secondary+0x129/0x160
>> [ 7145.500367]  secondary_startup_64_no_verify+0x17e/0x18b
>>
>> [ 7145.500392] value changed: 0x0000001a -> 0xffffffff
>>
>> [ 7145.500411] Reported by Kernel Concurrency Sanitizer on:
>> [ 7145.500421] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G             L     6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
>> [ 7145.500439] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>> [ 7145.500449] ==================================================================
>>
>> Please find dmesg log (what was available in the ring buffer) and the lshw output attached.
>>
>> NOTE: There are no proprietary drivers loaded and Canonical livepatch isn't working for this kernel,
>> probably previous ACPI KCSAN BUGs turn this taint on.
>>
>> Thank you very much for your time evaluating this report.
>
> P.S.
>
> According to Mr. Heo, I will add the decoded stacktrace to this bug report, which ought to have been done
> in the first place :-/
>
> [ 7145.500079] ==================================================================
> [ 7145.500109] BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
>
> [ 7145.500139] write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26:
> [ 7145.500155] tick_nohz_idle_stop_tick (kernel/time/tick-sched.c:904 kernel/time/tick-sched.c:1130)
> [ 7145.500173] do_idle (kernel/sched/idle.c:215 kernel/sched/idle.c:282)
> [ 7145.500186] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
> [ 7145.500199] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
> [ 7145.500216] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
>
> [ 7145.500245] read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16:
> [ 7145.500261] tick_nohz_next_event (kernel/time/tick-sched.c:868)
> [ 7145.500277] tick_nohz_get_sleep_length (kernel/time/tick-sched.c:1250)
> [ 7145.500293] menu_select (drivers/cpuidle/governors/menu.c:283)
> [ 7145.500309] cpuidle_select (drivers/cpuidle/cpuidle.c:359)
> [ 7145.500324] do_idle (kernel/sched/idle.c:208 kernel/sched/idle.c:282)
> [ 7145.500336] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
> [ 7145.500350] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
> [ 7145.500367] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
>
> [ 7145.500392] value changed: 0x0000001a -> 0xffffffff
>
> [ 7145.500411] Reported by Kernel Concurrency Sanitizer on:
> [ 7145.500421] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G             L     6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
> [ 7145.500439] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> [ 7145.500449] ==================================================================
>
> The code seems to be:
>
>
>     /*
>      * If this CPU is the one which updates jiffies, then give up
>      * the assignment and let it be taken by the CPU which runs
>      * the tick timer next, which might be this CPU as well. If we
>      * don't drop this here the jiffies might be stale and
>      * do_timer() never invoked. Keep track of the fact that it
>      * was the one which had the do_timer() duty last.
>      */
>     if (cpu == tick_do_timer_cpu) {
>         tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> →        ts->do_timer_last = 1;
>     } else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
>         ts->do_timer_last = 0;
>
>     }
>
> but I am not sure why this happens.
>
> Maybe a WRITE_ONCE(ts->do_timer_last, 1) is required?
>
> Or is it another KCSAN false positive ...

I have seen a few more KCSAN reports regarding this variable:

[  153.285199] ==================================================================
[  153.292529] BUG: KCSAN: data-race in tick_nohz_next_event+0x15c/0x29c
[  153.299118]
[  153.300685] race at unknown origin, with read to 0xffffffc082aa1048 of 4 bytes by task 0 on cpu 2:
[  153.309760]  tick_nohz_next_event+0x15c/0x29c
[  153.314256]  tick_nohz_get_sleep_length+0xd8/0x118
[  153.319186]  menu_select+0x1e0/0xc08
[  153.322875]  cpuidle_select+0x4c/0x60
[  153.326686]  do_idle+0x1e0/0x2b4
[  153.330062]  cpu_startup_entry+0x34/0x3c
[  153.334133]  secondary_start_kernel+0x1e8/0x21c
[  153.338821]  __secondary_switched+0xb8/0xbc
[  153.343143]
[  153.344706] no locks held by swapper/2/0.
[  153.348802] irq event stamp: 129586
[  153.352370] hardirqs last  enabled at (129585): [<ffffffc081acaa2c>] el1_interrupt+0x40/0x50
[  153.360964] hardirqs last disabled at (129586): [<ffffffc0800d8fec>] do_idle+0xc8/0x2b4
[  153.369132] softirqs last  enabled at (129560): [<ffffffc08001071c>] __do_softirq+0x47c/0x500
[  153.377786] softirqs last disabled at (129549): [<ffffffc080017134>] ____do_softirq+0x10/0x1c
[  153.386457]
[  153.388019] value changed: 0xffffffff -> 0x00000001
[  153.392993]
[  153.394555] Reported by Kernel Concurrency Sanitizer on:
[  153.399947] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.6.20+ #102
[  153.406595] Hardware name: xlnx,zynqmp (DT)
[  153.410866] ==================================================================

and:

[   75.362055] ==================================================================
[   75.369461] BUG: KCSAN: data-race in tick_sched_do_timer / tick_sched_do_timer
[   75.376854]
[   75.378426] write to 0xffffffc082aa1048 of 4 bytes by interrupt on cpu 1:
[   75.385335]  tick_sched_do_timer+0x124/0x168
[   75.389744]  tick_sched_timer+0x48/0xc8
[   75.393720]  __hrtimer_run_queues+0x200/0x5f0
[   75.398198]  hrtimer_interrupt+0x1b0/0x3f4
[   75.402417]  arch_timer_handler_phys+0x40/0x54
[   75.406983]  handle_percpu_devid_irq+0xf8/0x20c
[   75.411679]  generic_handle_domain_irq+0x48/0x64
[   75.416435]  gic_handle_irq+0xc4/0x100
[   75.420298]  call_on_irq_stack+0x24/0x4c
[   75.424360]  do_interrupt_handler+0xc0/0xc4
[   75.428692]  el1_interrupt+0x34/0x50
[   75.432415]  el1h_64_irq_handler+0x18/0x24
[   75.436668]  el1h_64_irq+0x64/0x68
[   75.440184]  percpu_counter_add_batch+0xd8/0x1b8
[   75.444967]  set_pte_range+0x14c/0x2c8
[   75.448838]  filemap_map_pages+0x2dc/0xa38
[   75.453091]  __handle_mm_fault+0x1304/0x17dc
[   75.457492]  handle_mm_fault+0xa0/0x374
[   75.461450]  do_page_fault+0x1f8/0x620
[   75.465365]  do_translation_fault+0x118/0x138
[   75.469887]  do_mem_abort+0x5c/0xe4
[   75.473516]  el0_da+0x5c/0xac
[   75.476623]  el0t_64_sync_handler+0xe4/0x158
[   75.481050]  el0t_64_sync+0x190/0x194
[   75.484834]
[   75.486405] 5 locks held by (d-logind)/293:
[   75.490693]  #0: ffffff88076753c0 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x15c/0x5c0
[   75.500441]  #1: ffffffc082b042f0 (rcu_read_lock){....}-{1:2}, at: __handle_mm_fault+0x1260/0x17dc
[   75.509755]  #2: ffffffc082b042f0 (rcu_read_lock){....}-{1:2}, at: filemap_map_pages+0x0/0xa38
[   75.518748]  #3: ffffffc082b042f0 (rcu_read_lock){....}-{1:2}, at: __pte_offset_map+0x0/0x2a0
[   75.527662]  #4: ffffff880762b018 (ptlock_ptr(ptdesc)){+.+.}-{2:2}, at: __pte_offset_map_lock+0xa4/0x264
[   75.537488] irq event stamp: 1761
[   75.540890] hardirqs last  enabled at (1760): [<ffffffc080bc1f78>] percpu_counter_add_batch+0x160/0x1b8
[   75.550465] hardirqs last disabled at (1761): [<ffffffc081aca990>] el1_interrupt+0x24/0x50
[   75.558893] softirqs last  enabled at (1534): [<ffffffc08001071c>] __do_softirq+0x47c/0x500
[   75.567382] softirqs last disabled at (1529): [<ffffffc080017134>] ____do_softirq+0x10/0x1c
[   75.575889]
[   75.577460] read to 0xffffffc082aa1048 of 4 bytes by interrupt on cpu 2:
[   75.584273]  tick_sched_do_timer+0x48/0x168
[   75.588596]  tick_sched_timer+0x48/0xc8
[   75.592572]  __hrtimer_run_queues+0x200/0x5f0
[   75.597050]  hrtimer_interrupt+0x1b0/0x3f4
[   75.601269]  arch_timer_handler_phys+0x40/0x54
[   75.605835]  handle_percpu_devid_irq+0xf8/0x20c
[   75.610531]  generic_handle_domain_irq+0x48/0x64
[   75.615287]  gic_handle_irq+0xc4/0x100
[   75.619150]  call_on_irq_stack+0x24/0x4c
[   75.623204]  do_interrupt_handler+0xc0/0xc4
[   75.627535]  el1_interrupt+0x34/0x50
[   75.631258]  el1h_64_irq_handler+0x18/0x24
[   75.635512]  el1h_64_irq+0x64/0x68
[   75.639027]  lock_acquire+0x13c/0x2f0
[   75.642820]  count_memcg_event_mm.part.0+0x70/0x2a8
[   75.647829]  handle_mm_fault+0x1d8/0x374
[   75.651874]  do_page_fault+0x1f8/0x620
[   75.655789]  do_mem_abort+0x5c/0xe4
[   75.659417]  el0_da+0x5c/0xac
[   75.662524]  el0t_64_sync_handler+0xe4/0x158
[   75.666951]  el0t_64_sync+0x190/0x194
[   75.670736]
[   75.672307] 2 locks held by systemd/1:
[   75.676152]  #0: ffffff88042627b0 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x15c/0x5c0
[   75.685900]  #1: ffffffc082b042f0 (rcu_read_lock){....}-{1:2}, at: count_memcg_event_mm.part.0+0x0/0x2a8
[   75.695726] irq event stamp: 576347
[   75.699302] hardirqs last  enabled at (576346): [<ffffffc08001af70>] local_daif_restore+0x38/0x54
[   75.708338] hardirqs last disabled at (576347): [<ffffffc081aca990>] el1_interrupt+0x24/0x50
[   75.716940] softirqs last  enabled at (576161): [<ffffffc08001071c>] __do_softirq+0x47c/0x500
[   75.725602] softirqs last disabled at (576154): [<ffffffc080017134>] ____do_softirq+0x10/0x1c
[   75.734283]
[   75.735854] value changed: 0xffffffff -> 0x00000000
[   75.740836]
[   75.742407] Reported by Kernel Concurrency Sanitizer on:
[   75.747808] CPU: 2 PID: 1 Comm: systemd Not tainted 6.6.20+ #101
[   75.754290] Hardware name: xlnx,zynqmp (DT)
[   75.758563] ==================================================================

The justification seems to be in tick_sched_do_timer:

        /*
         * Check if the do_timer duty was dropped. We don't care about
         * concurrency: This happens only when the CPU in charge went
         * into a long sleep. If two CPUs happen to assign themselves to
         * this duty, then the jiffies update is still serialized by
         * jiffies_lock.
         *
         * If nohz_full is enabled, this should not happen because the
         * tick_do_timer_cpu never relinquishes.
         */

with the other assignment being in tick_nohz_stop_tick. I'm not familiar
enough with this code to say whether we should be using READ/WRITE_ONCE
or maybe just data_race (as would be implied by the comment above).

--Sean

[Embedded World 2024, SECO SpA]<https://www.messe-ticket.de/Nuernberg/embeddedworld2024/Register/ew24517689>