lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3993c468-fdac-2ff2-c3ee-9784c098694c@alu.unizg.hr>
Date:   Fri, 18 Aug 2023 16:15:49 +0200
From:   Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To:     linux-kernel@...r.kernel.org
Cc:     Frederic Weisbecker <frederic@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [BUG] KCSAN: data-race in tick_nohz_idle_stop_tick /
 tick_nohz_next_event



On 8/17/23 21:32, Mirsad Todorovac wrote:
> Hi,
> 
> This is your friendly bug reporter. :-)
> 
> The environment is the vanilla torvalds kernel on Ubuntu 22.04 LTS and a Ryzen 7950X assembled box.
> 
> [ 7145.500079] ==================================================================
> [ 7145.500109] BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
> 
> [ 7145.500139] write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26:
> [ 7145.500155]  tick_nohz_idle_stop_tick+0x3b1/0x4a0
> [ 7145.500173]  do_idle+0x1e3/0x250
> [ 7145.500186]  cpu_startup_entry+0x20/0x30
> [ 7145.500199]  start_secondary+0x129/0x160
> [ 7145.500216]  secondary_startup_64_no_verify+0x17e/0x18b
> 
> [ 7145.500245] read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16:
> [ 7145.500261]  tick_nohz_next_event+0xe7/0x1e0
> [ 7145.500277]  tick_nohz_get_sleep_length+0xa7/0xe0
> [ 7145.500293]  menu_select+0x82/0xb90
> [ 7145.500309]  cpuidle_select+0x44/0x60
> [ 7145.500324]  do_idle+0x1c2/0x250
> [ 7145.500336]  cpu_startup_entry+0x20/0x30
> [ 7145.500350]  start_secondary+0x129/0x160
> [ 7145.500367]  secondary_startup_64_no_verify+0x17e/0x18b
> 
> [ 7145.500392] value changed: 0x0000001a -> 0xffffffff
> 
> [ 7145.500411] Reported by Kernel Concurrency Sanitizer on:
> [ 7145.500421] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G             L     6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
> [ 7145.500439] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> [ 7145.500449] ==================================================================
> 
> Please find dmesg log (what was available in the ring buffer) and the lshw output attached.
> 
> NOTE: There are no proprietary drivers loaded and Canonical livepatch isn't working for this kernel,
> probably previous ACPI KCSAN BUGs turn this taint on.
> 
> Thank you very much for your time evaluating this report.

P.S.

According to Mr. Heo, I will add the decoded stacktrace to this bug report, which ought to have been done
in the first place :-/

[ 7145.500079] ==================================================================
[ 7145.500109] BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event

[ 7145.500139] write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26:
[ 7145.500155] tick_nohz_idle_stop_tick (kernel/time/tick-sched.c:904 kernel/time/tick-sched.c:1130)
[ 7145.500173] do_idle (kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[ 7145.500186] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
[ 7145.500199] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
[ 7145.500216] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)

[ 7145.500245] read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16:
[ 7145.500261] tick_nohz_next_event (kernel/time/tick-sched.c:868)
[ 7145.500277] tick_nohz_get_sleep_length (kernel/time/tick-sched.c:1250)
[ 7145.500293] menu_select (drivers/cpuidle/governors/menu.c:283)
[ 7145.500309] cpuidle_select (drivers/cpuidle/cpuidle.c:359)
[ 7145.500324] do_idle (kernel/sched/idle.c:208 kernel/sched/idle.c:282)
[ 7145.500336] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
[ 7145.500350] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
[ 7145.500367] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441)

[ 7145.500392] value changed: 0x0000001a -> 0xffffffff

[ 7145.500411] Reported by Kernel Concurrency Sanitizer on:
[ 7145.500421] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G             L     6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
[ 7145.500439] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 7145.500449] ==================================================================

The code seems to be:


	/*
	 * If this CPU is the one which updates jiffies, then give up
	 * the assignment and let it be taken by the CPU which runs
	 * the tick timer next, which might be this CPU as well. If we
	 * don't drop this here the jiffies might be stale and
	 * do_timer() never invoked. Keep track of the fact that it
	 * was the one which had the do_timer() duty last.
	 */
	if (cpu == tick_do_timer_cpu) {
		tick_do_timer_cpu = TICK_DO_TIMER_NONE;
→		ts->do_timer_last = 1;
	} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
		ts->do_timer_last = 0;
	
	}

but I am not sure why this happens.

Maybe a WRITE_ONCE(ts->do_timer_last, 1) is required?

Or is it another KCSAN false positive ...

Thank you very much for investigating.

Best regards,
Mirsad Todorovac

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ