[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210319172334.GN2696@paulmck-ThinkPad-P72>
Date: Fri, 19 Mar 2021 10:23:34 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc: Marco Elver <elver@...gle.com>, Theodore Ts'o <tytso@....edu>,
Dmitry Vyukov <dvyukov@...gle.com>,
syzbot <syzbot+30774a6acf6a2cf6d535@...kaller.appspotmail.com>,
Jan Kara <jack@...e.com>, linux-ext4@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Jan Kara <jack@...e.cz>
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle /
start_this_handle
On Fri, Mar 19, 2021 at 11:15:42PM +0900, Tetsuo Handa wrote:
> On 2021/03/12 0:54, Marco Elver wrote:
> >> But the more we could have the compiler automatically figure out
> >> things without needing an explicit tag, it would seem to me that this
> >> would be better, since manual tagging is going to be more error-prone.
> >
> > What you're alluding to here would go much further than a data race
> > detector ("data race" is still just defined by the memory model). The
> > wish that there was a static analysis tool that would automatically
> > understand the "concurrency semantics as intended by the developer" is
> > something that'd be nice to have, but just doesn't seem realistic.
> > Because how can a tool tell what the developer intended, without input
> > from that developer?
>
> Input from developers is very important for not only compilers and tools
> but also allowing bug-explorers to understand what is happening.
> ext4 currently has
>
> possible deadlock in start_this_handle (2)
> https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba
>
> which even maintainers cannot understand what is happening.
> How can bug-explorers know implicit logic which maintainers believe safe and correct?
> It is possible that some oversight in implicit logic is the cause of
> "possible deadlock in start_this_handle (2)".
> Making implicit assumptions clear helps understanding.
Just to be clear, the above diagnostic is from lockdep rather than KCSAN.
According to the sample crash result, different code paths acquire
jdb2_handle and the __fs_reclaim_map in different orders. It looks
to me that __fs_reclaim_map isn't really a lock, but rather a mode
indicator. If so, lockdep should set it up accordingly, perhaps
in a manner similar to rcu_lock_map.
> Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking?
> syzbot is already waiting for
> "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at
> https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 .
The first thing is to work out what the code should be doing. What KCSAN
is saying is that a variable is being locklessly updated. Is it really
OK for that variable to be locklessly updated? If not, a larger fix
is required.
For more information, please see Marco's LWN series:
https://lwn.net/Articles/816850/ and https://lwn.net/Articles/816854/
Alternatively, you can refer to the documentation being proposed for
the Linux kernel tree:
https://lore.kernel.org/lkml/20210304004543.25364-3-paulmck@kernel.org/
> > If there's worry marking accesses is error-prone, then that might be a
> > signal that the concurrency design is too complex (or the developer
> > hasn't considered all cases).
> >
> > For that reason, we need to mark accesses to tell the compiler and
> > tooling where to expect concurrency, so that 1) the compiler generates
> > correct code, and 2) tooling such as KCSAN can double-check what the
> > developer intended is actually what's happening.
>
> and 3) bug-explorers can understand what the developers are assuming/missing.
If the above information doesn't help the bug explorers, please let me
know.
Thanx, Paul
Powered by blists - more mailing lists