[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Z8yErbDWPxv5tX0hnw7cTa6nJjg5f=MWMYS-2X91TZ9w@mail.gmail.com>
Date: Thu, 9 Jul 2020 12:13:44 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: syzbot <syzbot+0f719294463916a3fc0e@...kaller.appspotmail.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: KASAN: stack-out-of-bounds Read in csd_lock_record
On Tue, Jul 7, 2020 at 6:26 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Tue, Jul 07, 2020 at 05:51:48PM +0200, Dmitry Vyukov wrote:
> > On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > >
> > > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> > > >
> > > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> > > > > Hello,
> > > > >
> > > > > syzbot found the following crash on:
> > > > >
> > > > > HEAD commit: 9e50b94b Add linux-next specific files for 20200703
> > > > > git tree: linux-next
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> > > > > compiler: gcc (GCC) 10.1.0-syz 20200507
> > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> > > > >
> > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > > Reported-by: syzbot+0f719294463916a3fc0e@...kaller.appspotmail.com
> > > >
> > > > Good catch! A call to csd_lock_record() was on the wrong side of a
> > > > call to csd_unlock().
> > >
> > > Thanks for taking a look.
> > >
> > > > But is folded into another commit for bisectability reasons, so
> > > > "Reported-by" would not make sense. I have instead added this to the
> > > > commit log:
> > > >
> > > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@...kaller.appspotmail.com ]
> > > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
> > > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
> > >
> > > This should work, as far as I remember sybot looks for the email+hash
> > > anywhere in the commit.
> > > FWIW Tested-by can make sense as well.
> >
> > Paul, there is also some spike of stalls in smp_call_function,
> > if you look at the top ones at:
> > https://syzkaller.appspot.com/upstream#open
> >
> > Can these be caused by the same root cause?
> > I am not sure what trees the bug was/is present... This seems to only
> > happen on linux-next and nowhere else. But these stalls equally happen
> > on mainline...
>
> I would be surprised, given that the csd_unlock() was before the faulting
> reference. But then again, I have been surprised before.
Yes, it seems unrelated.
It looks like something broken in the kernel recently and now instead
of diagnosing a stall on one CPU, it diagnoses it as a stall in
smp_call_function on another CPU. This produces large number of
assorted stall reports which are not too actionable...
> You aren't running scftorture with its longwait parameter set to a
> non-zero value, are you? In that case, stalls are expected behavior.
> This is to support test the CSD lock diagnostics in -rcu. Which isn't
> in mainline yet, so maybe I am asking a stupid question.
Since I don't know what is scftorture/longwait, I guess I am not running it :)
> If these are repeatable, one thing to try is to build the kernel with
> CSD_LOCK_WAIT_DEBUG=y. This requires c6c67d89c059 ("smp: Add source and
> destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp:
> Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch.
> This will dump out the smp_call_function() function that was to be
> invoked, on the off-chance that the problem is something like lock
> contention in that function.
Here are some with reproducers:
https://syzkaller.appspot.com/bug?id=8a1e95291152ce5afea43c103a1fd62a257fcf4b
https://syzkaller.appspot.com/bug?id=5e3ac329b6304aacc6304cfaab1a514bca12ce82
https://syzkaller.appspot.com/bug?id=a01b4478f89e19cee91531f7c2b7751f0caf8c0c
https://syzkaller.appspot.com/bug?id=e4caef9fc41d0c019c532a4257faec129699a42e
But the question is if this CSD_LOCK_WAIT_DEBUG=y is useful in
general? Should we enable it all the time?
Powered by blists - more mailing lists