lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNPpOym1eHYQBK4TyGgsDA=WujRJeR3aMpZPa6Y7ahtgKA@mail.gmail.com>
Date:   Fri, 4 Dec 2020 20:53:36 +0100
From:   Marco Elver <elver@...gle.com>
To:     Naresh Kamboju <naresh.kamboju@...aro.org>
Cc:     open list <linux-kernel@...r.kernel.org>,
        kasan-dev <kasan-dev@...glegroups.com>, rcu@...r.kernel.org,
        lkft-triage@...ts.linaro.org,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>, fweisbec@...il.com,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: BUG: KCSAN: data-race in tick_nohz_next_event / tick_nohz_stop_tick

On Fri, 4 Dec 2020 at 20:04, Naresh Kamboju <naresh.kamboju@...aro.org> wrote:
> LKFT started testing KCSAN enabled kernel from the linux next tree.
> Here we have found BUG: KCSAN: data-race in tick_nohz_next_event /
> tick_nohz_stop_tick

Thank you for looking into KCSAN. Would it be possible to collect
these reports in a moderation queue for now?

I'm currently trying to work out a strategy on how to best proceed
with all the data races in the kernel. We do know there are plenty. On
syzbot's internal moderation queue, we're currently looking at >300 of
them (some here:
https://syzkaller.appspot.com/upstream?manager=ci2-upstream-kcsan-gce).
Part of this strategy involves prioritizing certain concurrency bug
classes. Let's define the following buckets for concurrency bugs:

A. Data race, where failure due to current compilers is unlikely
(supposedly "benign"); merely marking the accesses
appropriately is sufficient. Finding a crash for these will
require a miscompilation, but otherwise look "benign" at the
C-language level.

B. Race-condition bugs where the bug manifests as a data race,
too -- simply marking things doesn't fix the problem. These
are the types of bugs where a data race would point out a
more severe issue.

C. Race-condition bugs where the bug never manifests as a data
race. An example of these might be 2 threads that acquire the
necessary locks, yet some interleaving of them still results
in a bug (e.g. because the logic inside the critical sections
is buggy). These are harder to detect with KCSAN as-is, and
require using ASSERT_EXCLUSIVE_ACCESS() or
ASSERT_EXCLUSIVE_WRITER() in the right place. See
https://lwn.net/Articles/816854/.

One problem currently is that the kernel has quite a lot type-(A)
reports if we run KCSAN, which makes it harder to identify bugs of type
(B) and (C). My wish for the future is that we can get to a place, where
the kernel has almost no unintentional (A) issues, so that we primarily
find (B) and (C) bugs.

The report below looks to be of type (A). Generally, the best strategy
for resolving these is to send a patch, and not a report. However, be
aware that sometimes it is really quite difficult to say if we're
looking at a type (A) or (B) issue, in which case it may still be fair
to send a report and briefly describe what you think is happening
(because that'll increase the likelihood of getting a response). I
recommend also reading "Developer/Maintainer data-race strategies" in
https://lwn.net/Articles/816854/ -- specifically note "[...] you
should not respond to KCSAN reports by mindlessly adding READ_ONCE(),
data_race(), and WRITE_ONCE(). Instead, a patch addressing a KCSAN
report must clearly identify the fix's approach and why that approach
is appropriate."

I recommend reading https://lwn.net/Articles/816850/ for more details.

> This report is from an x86_64 machine clang-11 linux next 20201201.
> Since we are running for the first time we do not call this regression.
>
> [   47.811425] BUG: KCSAN: data-race in tick_nohz_next_event /
> tick_nohz_stop_tick
> [   47.818738]
> [   47.820239] write to 0xffffffffa4cbe920 of 4 bytes by task 0 on cpu 2:
> [   47.826766]  tick_nohz_stop_tick+0x8b/0x310
> [   47.830951]  tick_nohz_idle_stop_tick+0xcb/0x170
> [   47.835571]  do_idle+0x193/0x250
> [   47.838804]  cpu_startup_entry+0x25/0x30
> [   47.842728]  start_secondary+0xa0/0xb0
> [   47.846482]  secondary_startup_64_no_verify+0xc2/0xcb
> [   47.851531]
> [   47.853034] read to 0xffffffffa4cbe920 of 4 bytes by task 0 on cpu 3:
> [   47.859473]  tick_nohz_next_event+0x165/0x1e0
> [   47.863831]  tick_nohz_get_sleep_length+0x94/0xd0
> [   47.868539]  menu_select+0x250/0xac0
> [   47.872116]  cpuidle_select+0x47/0x50
> [   47.875781]  do_idle+0x17c/0x250
> [   47.879015]  cpu_startup_entry+0x25/0x30
> [   47.882942]  start_secondary+0xa0/0xb0
> [   47.886694]  secondary_startup_64_no_verify+0xc2/0xcb
> [   47.891743]
> [   47.893234] Reported by Kernel Concurrency Sanitizer on:
> [   47.898541] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> 5.10.0-rc6-next-20201201 #2
> [   47.906017] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.2 05/23/2018

This report should have line numbers, otherwise it's impossible to say
which accesses are racing.

[ For those curious, this is the same report on syzbot's moderation
queue, with line numbers:
https://syzkaller.appspot.com/bug?id=d835c53d1a5e27922fcd1fbefc926a74790156cb
]

Thanks,
-- Marco

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ