lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Dec 2022 09:47:29 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Dmitry Vyukov <dvyukov@...gle.com>
Cc:     Dave Chinner <david@...morbit.com>, frederic@...nel.org,
        quic_neeraju@...cinc.com, Josh Triplett <josh@...htriplett.org>,
        RCU <rcu@...r.kernel.org>,
        syzbot <syzbot+912776840162c13db1a3@...kaller.appspotmail.com>,
        djwong@...nel.org, linux-kernel@...r.kernel.org,
        linux-xfs@...r.kernel.org, syzkaller-bugs@...glegroups.com,
        syzkaller <syzkaller@...glegroups.com>
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Tue, Dec 06, 2022 at 05:19:10PM +0100, Dmitry Vyukov wrote:
> On Tue, 6 Dec 2022 at 16:32, Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Tue, Dec 06, 2022 at 12:06:10PM +0100, Dmitry Vyukov wrote:
> > > On Tue, 6 Dec 2022 at 04:34, Dave Chinner <david@...morbit.com> wrote:
> > > >
> > > > On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > > > > Hello,
> > > > >
> > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > INFO: rcu detected stall in corrupted
> > > > >
> > > > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > > > > rcu: blocking rcu_node structures (internal RCU debug):
> > > >
> > > > I'm pretty sure this has nothing to do with the reproducer - the
> > > > console log here:
> > > >
> > > > > Tested on:
> > > > >
> > > > > commit:         bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > > > > git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
> > > >
> > > > indicates that syzbot is screwing around with bluetooth, HCI,
> > > > netdevsim, bridging, bonding, etc.
> > > >
> > > > There's no evidence that it actually ran the reproducer for the bug
> > > > reported in this thread - there's no record of a single XFS
> > > > filesystem being mounted in the log....
> > > >
> > > > It look slike someone else also tried a private patch to fix this
> > > > problem (which was obviously broken) and it failed with exactly the
> > > > same RCU warnings. That was run from the same commit id as the
> > > > original reproducer, so this looks like either syzbot is broken or
> > > > there's some other completely unrelated problem that syzbot is
> > > > tripping over here.
> > > >
> > > > Over to the syzbot people to debug the syzbot failure....
> > >
> > > Hi Dave,
> > >
> > > It's not uncommon for a single program to trigger multiple bugs.
> > > That's what happens here. The rcu stall issue is reproducible with
> > > this test program.
> > > In such cases you can either submit more test requests, or test manually.
> > >
> > > I think there is an RCU expedited stall detection.
> > > For some reason CONFIG_RCU_EXP_CPU_STALL_TIMEOUT is limited to 21
> > > seconds, and that's not enough for reliable flake-free stress testing.
> > > We bump other timeouts to 100+ seconds.
> > > +RCU maintainers, do you mind removing the overly restrictive limit on
> > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT?
> > > Or you think there is something to fix in the kernel to not stall? I
> > > see the test writes to
> > > /proc/sys/vm/drop_caches, maybe there is some issue in that code.
> >
> > Like this?
> >
> > If so, I don't see why not.  And in that case, may I please have
> > your Tested-by or similar?
> 
> I've tried with this patch and RCU_EXP_CPU_STALL_TIMEOUT=80000.
> Running the test program I got some kernel BUG in XFS and no RCU
> errors/warnings.
> 
> Tested-by: Dmitry Vyukov <dvyukov@...gle.com>

Applied, thank you both!

I expect to push this into the v6.3 merge window, that is, not the
one coming up real soon now, but the one after that.

							Thanx, Paul

> Thanks
> 
> > At the same time, I am sure that there are things in the kernel that
> > should be adjusted to avoid stalls, but I recognize that different
> > developers in different situations will have different issues that they
> > choose to focus on.  ;-)
> >
> >                                                         Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > index 49da904df6aa6..2984de629f749 100644
> > --- a/kernel/rcu/Kconfig.debug
> > +++ b/kernel/rcu/Kconfig.debug
> > @@ -82,7 +82,7 @@ config RCU_CPU_STALL_TIMEOUT
> >  config RCU_EXP_CPU_STALL_TIMEOUT
> >         int "Expedited RCU CPU stall timeout in milliseconds"
> >         depends on RCU_STALL_COMMON
> > -       range 0 21000
> > +       range 0 300000
> >         default 0
> >         help
> >           If a given expedited RCU grace period extends more than the

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ