lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 9 Apr 2020 15:35:02 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     Leon Romanovsky <leon@...nel.org>,
        syzbot <syzbot+9627a92b1f9262d5d30c@...kaller.appspotmail.com>,
        RDMA mailing list <linux-rdma@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>,
        Rafael Wysocki <rafael@...nel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: WARNING in ib_umad_kill_port

On Tue, Apr 7, 2020 at 4:35 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > >
> > > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote:
> > > > > I'm not sure what could be done wrong here to elicit this:
> > > > >
> > > > >  sysfs group 'power' not found for kobject 'umad1'
> > > > >
> > > > > ??
> > > > >
> > > > > I've seen another similar sysfs related trigger that we couldn't
> > > > > figure out.
> > > > >
> > > > > Hard to investigate without a reproducer.
> > > >
> > > > Based on all of the sysfs-related bugs I've seen, my bet would be on
> > > > some races. E.g. one thread registers devices, while another
> > > > unregisters these.
> > >
> > > I did check that the naming is ordered right, at least we won't be
> > > concurrently creating and destroying umadX sysfs of the same names.
> > >
> > > I'm also fairly sure we can't be destroying the parent at the same
> > > time as this child.
> > >
> > > Do you see the above commonly? Could it be some driver core thing? Or
> > > is it more likely something wrong in umad?
> >
> > Mmmm... I can't say, I am looking at some bugs very briefly. I've
> > noticed that sysfs comes up periodically (or was it some other similar
> > fs?).
>
> Hmm..
>
> Looking at the git history I see several cases where there are
> ordering problems. I wonder if the rdma parent device is being
> destroyed before the rdma devices complete destruction?
>
> I see the syzkaller is creating a bunch of virtual net devices, and I
> assume it has created a software rdma device on one of these virtual
> devices.
>
> So I'm guessing that it is also destroying a parent? But I can't guess
> which.. Some simple tests with veth suggest it is OK because the
> parent is virtual. But maybe bond or bridge or something?
>
> The issue in rdma is that unregistering a netdev triggers an async
> destruction of the RDMA devices. This has to be async because the
> netdev notification is delivered with RTNL held, and a rdma device
> cannot be destroyed while holding RTNL.
>
> So there is a race, I suppose, where the netdev can complete
> destruction while rdma continues, and if someone deletes the sysfs
> holding the netdev before rdma completes, I'm going to guess, that we
> hit this warning?
>
> Could it be? I would love to know what netdev the rdma device was
> created on, but it doesn't seem to show in the trace :\
>
> This theory could be made more likely by adding a sleep to
> ib_unregister_work() to increase the race window - is there some way
> to get syzkaller to search for a reproducer with that patch?


Bad it happened in kthread context. Otherwise it's usually possible to
pinpoint the test based on process name.

syz-repro utility will do reproduction process with a any kernel you give it:
https://github.com/google/syzkaller/blob/master/docs/reproducing_crashes.md

Or it's possible to run individual programs, or whole log with
syz-execprog utility:
https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md

Or maybe you could pinpoint the guilty test program by hand in the log
(it's probably somewhere closer to the end):
https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000

Powered by blists - more mailing lists