linux-ext4 - Re: [syzbot] KCSAN: data-race in start_this_handle / start_this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210319172334.GN2696@paulmck-ThinkPad-P72>
Date:   Fri, 19 Mar 2021 10:23:34 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Marco Elver <elver@...gle.com>, Theodore Ts'o <tytso@....edu>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+30774a6acf6a2cf6d535@...kaller.appspotmail.com>,
        Jan Kara <jack@...e.com>, linux-ext4@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Jan Kara <jack@...e.cz>
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle /
 start_this_handle

On Fri, Mar 19, 2021 at 11:15:42PM +0900, Tetsuo Handa wrote:
> On 2021/03/12 0:54, Marco Elver wrote:
> >> But the more we could have the compiler automatically figure out
> >> things without needing an explicit tag, it would seem to me that this
> >> would be better, since manual tagging is going to be more error-prone.
> > 
> > What you're alluding to here would go much further than a data race
> > detector ("data race" is still just defined by the memory model). The
> > wish that there was a static analysis tool that would automatically
> > understand the "concurrency semantics as intended by the developer" is
> > something that'd be nice to have, but just doesn't seem realistic.
> > Because how can a tool tell what the developer intended, without input
> > from that developer?
> 
> Input from developers is very important for not only compilers and tools
> but also allowing bug-explorers to understand what is happening.
> ext4 currently has
> 
>   possible deadlock in start_this_handle (2)
>   https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba
> 
> which even maintainers cannot understand what is happening.
> How can bug-explorers know implicit logic which maintainers believe safe and correct?
> It is possible that some oversight in implicit logic is the cause of
> "possible deadlock in start_this_handle (2)".
> Making implicit assumptions clear helps understanding.

Just to be clear, the above diagnostic is from lockdep rather than KCSAN.

According to the sample crash result, different code paths acquire
jdb2_handle and the __fs_reclaim_map in different orders.  It looks
to me that __fs_reclaim_map isn't really a lock, but rather a mode
indicator.  If so, lockdep should set it up accordingly, perhaps
in a manner similar to rcu_lock_map.

> Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking?
> syzbot is already waiting for
> "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at
> https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 .

The first thing is to work out what the code should be doing.  What KCSAN
is saying is that a variable is being locklessly updated.  Is it really
OK for that variable to be locklessly updated?  If not, a larger fix
is required.

For more information, please see Marco's LWN series:
https://lwn.net/Articles/816850/ and https://lwn.net/Articles/816854/

Alternatively, you can refer to the documentation being proposed for
the Linux kernel tree:

https://lore.kernel.org/lkml/20210304004543.25364-3-paulmck@kernel.org/

> > If there's worry marking accesses is error-prone, then that might be a
> > signal that the concurrency design is too complex (or the developer
> > hasn't considered all cases).
> > 
> > For that reason, we need to mark accesses to tell the compiler and
> > tooling where to expect concurrency, so that 1) the compiler generates
> > correct code, and 2) tooling such as KCSAN can double-check what the
> > developer intended is actually what's happening.
> 
> and 3) bug-explorers can understand what the developers are assuming/missing.

If the above information doesn't help the bug explorers, please let me
know.

							Thanx, Paul