linux-kernel - Re: WARNING: suspicious RCU usage in modeset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKMK7uEOojWe=KEAkdor4fqWh8=Q6wZTYRNBPWaACaen-iyi0Q@mail.gmail.com>
Date:   Fri, 18 Dec 2020 17:23:55 +0100
From:   Daniel Vetter <daniel.vetter@...ll.ch>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        syzbot <syzbot+972b924c988834e868b2@...kaller.appspotmail.com>,
        Josh Triplett <josh@...htriplett.org>, rcu@...r.kernel.org,
        Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        "Gustavo A. R. Silva" <gustavoars@...nel.org>,
        Linux Fbdev development list <linux-fbdev@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Nathan Chancellor <natechancellor@...il.com>,
        Peter Rosin <peda@...ntia.se>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: WARNING: suspicious RCU usage in modeset_lock

On Fri, Dec 18, 2020 at 5:10 PM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Thu, 17 Dec 2020 11:03:20 +0100
> Daniel Vetter <daniel.vetter@...ll.ch> wrote:
>
> > I think we're tripping over the might_sleep() all the mutexes have,
> > and that's not as good as yours, but good enough to catch a missing
> > rcu_read_unlock(). That's kinda why I'm baffled, since like almost
> > every 2nd function in the backtrace grabbed a mutex and it was all
> > fine until the very last.
> >
> > I think it would be really nice if the rcu checks could retain (in
> > debugging only) the backtrace of the outermost rcu_read_lock, so we
> > could print that when something goes wrong in cases where it's leaked.
> > For normal locks lockdep does that already (well not full backtrace I
> > think, just the function that acquired the lock, but that's often
> > enough). I guess that doesn't exist yet?
> >
> > Also yes without reproducer this is kinda tough nut to crack.
>
> I'm looking at drm_client_modeset_commit_atomic(), where it triggered after
> the "retry:" label, which to get to, does a bit of goto spaghetti, with
> a -EDEADLK detected and a goto backoff, which calls goto retry, and then
> the next mutex taken is the one that triggers the bug.

This is standard drm locking spaghetti using ww_mutexes. Enable
CONFIG_DEBUG_WW_MUTEX_SLOWPATH and you'll hit this all the time, in
all kinds of situations. We're using this all the time because it's
way too easy to to get the error cases wrong.

> As this is hard to reproduce, but reproducible by a fuzzer, I'm guessing
> there's some error return path somewhere in there that doesn't release an
> rcu_read_lock().

We're maybe a bit too happy to use funny locking schemes like
ww_mutex, but at least to my knowledge there's no rcu anywhere near
these. Or preempt disable fwiw (which I think the consensus is the
more likely culprit). So I have no idea how this leaks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch