[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3993c468-fdac-2ff2-c3ee-9784c098694c@alu.unizg.hr>
Date: Fri, 18 Aug 2023 16:15:49 +0200
From: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To: linux-kernel@...r.kernel.org
Cc: Frederic Weisbecker <frederic@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [BUG] KCSAN: data-race in tick_nohz_idle_stop_tick /
tick_nohz_next_event
On 8/17/23 21:32, Mirsad Todorovac wrote:
> Hi,
>
> This is your friendly bug reporter. :-)
>
> The environment is the vanilla torvalds kernel on Ubuntu 22.04 LTS and a Ryzen 7950X assembled box.
>
> [ 7145.500079] ==================================================================
> [ 7145.500109] BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
>
> [ 7145.500139] write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26:
> [ 7145.500155] tick_nohz_idle_stop_tick+0x3b1/0x4a0
> [ 7145.500173] do_idle+0x1e3/0x250
> [ 7145.500186] cpu_startup_entry+0x20/0x30
> [ 7145.500199] start_secondary+0x129/0x160
> [ 7145.500216] secondary_startup_64_no_verify+0x17e/0x18b
>
> [ 7145.500245] read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16:
> [ 7145.500261] tick_nohz_next_event+0xe7/0x1e0
> [ 7145.500277] tick_nohz_get_sleep_length+0xa7/0xe0
> [ 7145.500293] menu_select+0x82/0xb90
> [ 7145.500309] cpuidle_select+0x44/0x60
> [ 7145.500324] do_idle+0x1c2/0x250
> [ 7145.500336] cpu_startup_entry+0x20/0x30
> [ 7145.500350] start_secondary+0x129/0x160
> [ 7145.500367] secondary_startup_64_no_verify+0x17e/0x18b
>
> [ 7145.500392] value changed: 0x0000001a -> 0xffffffff
>
> [ 7145.500411] Reported by Kernel Concurrency Sanitizer on:
> [ 7145.500421] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G L 6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
> [ 7145.500439] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> [ 7145.500449] ==================================================================
>
> Please find dmesg log (what was available in the ring buffer) and the lshw output attached.
>
> NOTE: There are no proprietary drivers loaded and Canonical livepatch isn't working for this kernel,
> probably previous ACPI KCSAN BUGs turn this taint on.
>
> Thank you very much for your time evaluating this report.
P.S.
According to Mr. Heo, I will add the decoded stacktrace to this bug report, which ought to have been done
in the first place :-/
[ 7145.500079] ==================================================================
[ 7145.500109] BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
[ 7145.500139] write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26:
[ 7145.500155] tick_nohz_idle_stop_tick (kernel/time/tick-sched.c:904 kernel/time/tick-sched.c:1130)
[ 7145.500173] do_idle (kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[ 7145.500186] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
[ 7145.500199] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
[ 7145.500216] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
[ 7145.500245] read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16:
[ 7145.500261] tick_nohz_next_event (kernel/time/tick-sched.c:868)
[ 7145.500277] tick_nohz_get_sleep_length (kernel/time/tick-sched.c:1250)
[ 7145.500293] menu_select (drivers/cpuidle/governors/menu.c:283)
[ 7145.500309] cpuidle_select (drivers/cpuidle/cpuidle.c:359)
[ 7145.500324] do_idle (kernel/sched/idle.c:208 kernel/sched/idle.c:282)
[ 7145.500336] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
[ 7145.500350] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
[ 7145.500367] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)
[ 7145.500392] value changed: 0x0000001a -> 0xffffffff
[ 7145.500411] Reported by Kernel Concurrency Sanitizer on:
[ 7145.500421] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G L 6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
[ 7145.500439] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 7145.500449] ==================================================================
The code seems to be:
/*
* If this CPU is the one which updates jiffies, then give up
* the assignment and let it be taken by the CPU which runs
* the tick timer next, which might be this CPU as well. If we
* don't drop this here the jiffies might be stale and
* do_timer() never invoked. Keep track of the fact that it
* was the one which had the do_timer() duty last.
*/
if (cpu == tick_do_timer_cpu) {
tick_do_timer_cpu = TICK_DO_TIMER_NONE;
→ ts->do_timer_last = 1;
} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
ts->do_timer_last = 0;
}
but I am not sure why this happens.
Maybe a WRITE_ONCE(ts->do_timer_last, 1) is required?
Or is it another KCSAN false positive ...
Thank you very much for investigating.
Best regards,
Mirsad Todorovac
Powered by blists - more mailing lists