lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+b1NsdnC1hqk54Y8zEs7r3y7+EnAqbG1eBmuhji_bfFqw@mail.gmail.com>
Date:   Wed, 31 Mar 2021 11:57:23 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     syzbot <syzbot+1a33233ccd8201ec2322@...kaller.appspotmail.com>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jens Axboe <axboe@...nel.dk>,
        Christian Brauner <christian@...uner.io>,
        LKML <linux-kernel@...r.kernel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: [syzbot] WARNING: suspicious RCU usage in copy_page_range

On Wed, Mar 31, 2021 at 9:31 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Wed, Mar 31, 2021 at 08:11:38AM +0200, Dmitry Vyukov wrote:
> > On Wed, Mar 31, 2021 at 12:26 AM syzbot
> > <syzbot+1a33233ccd8201ec2322@...kaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    db24726b Merge tag 'integrity-v5.12-fix' of git://git.kern..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=16c16b7cd00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=1a33233ccd8201ec2322
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+1a33233ccd8201ec2322@...kaller.appspotmail.com
> >
> > I think this is a LOCKDEP issue. +LOCKDEP maintainers.
> >
> > Another bug happened on another thread ("WARNING: possible circular
> > locking dependency detected"). Lockdep disabled lock tracking
> > ("debug_locks = 0" in the report), which probably made it miss
> > rcu_unlock somewhere, but it did not turn off reporting yet and
> > produced the false positive first.
> >
> > I think if LOCKDEP disables lock tracking, it must also disable
> > reporting of issues that require lock tracking. That would avoid false
> > positives.
>
> Still early and brain hasn't really booted yet, but features that
> require lock tracking are supposed to check debug_locks.
>
> And afaict debug_lockdep_rcu_enabled(), which is called by
> RCU_LOCKDEP_WARN(), which is called by rcu_sleep_check() does just that.

Right... yet it somehow happens.
Looking at a dozen of reports, all with 2 concurrent lockdep splats
and "debug_locks = 0" in the report, I am pretty sure there is some
kind of race in lockdep.
I see there are at least 2 places where lockdep can falsely assume rcu
lock is held:
https://elixir.bootlin.com/linux/v5.12-rc5/source/kernel/locking/lockdep.c#L5543
https://elixir.bootlin.com/linux/v5.12-rc5/source/kernel/rcu/update.c#L105
both to "avoid false positives", but for "Illegal context switch in
RCU-bh read-side critical section" it can actually lead to false
positives, right?

Is there something else that turns off tracking before setting
debug_locks=0? Perhaps we get into that window where tracking is
disabled, but debug_locks is not reset yet?

lockdep_enabled() returns false if lockdep_recursion var is set:
https://elixir.bootlin.com/linux/v5.12-rc5/source/kernel/locking/lockdep.c#L87

but lockdep_lock() sets it _before_ taking the lock:
https://elixir.bootlin.com/linux/v5.12-rc5/source/kernel/locking/lockdep.c#L111

Is it possible that lockdep_recursion is set, then the task is
rescheduled and another task sees wrong value for lockdep_recursion?
Shouldn't lockdep_recursion be set _after_ arch_spin_unlock(&__lock)?
Though, I assume lockdep_lock() is called frequently and not only on
reports, so if my reasoning would be true, it would produce false
positives all the time, not necessary on concurrent reports... this
does not agree with the observed failure mode...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ