[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181210102058.GO21184@phenom.ffwll.local>
Date:   Mon, 10 Dec 2018 11:20:58 +0100
From:   Daniel Vetter <daniel@...ll.ch>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Daniel Vetter <daniel.vetter@...ll.ch>,
        LKML <linux-kernel@...r.kernel.org>,
        DRI Development <dri-devel@...ts.freedesktop.org>,
        Ramalingam C <ramalingam.c@...el.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Daniel Vetter <daniel.vetter@...el.com>
Subject: Re: [PATCH] drivers/base: use a worker for sysfs unbind
On Mon, Dec 10, 2018 at 11:18:32AM +0100, Daniel Vetter wrote:
> On Mon, Dec 10, 2018 at 11:06:34AM +0100, Greg Kroah-Hartman wrote:
> > On Mon, Dec 10, 2018 at 09:46:53AM +0100, Daniel Vetter wrote:
> > > Drivers might want to remove some sysfs files, which needs the same
> > > locks and ends up angering lockdep. Relevant snippet of the stack
> > > trace:
> > > 
> > >   kernfs_remove_by_name_ns+0x3b/0x80
> > >   bus_remove_driver+0x92/0xa0
> > >   acpi_video_unregister+0x24/0x40
> > >   i915_driver_unload+0x42/0x130 [i915]
> > >   i915_pci_remove+0x19/0x30 [i915]
> > >   pci_device_remove+0x36/0xb0
> > >   device_release_driver_internal+0x185/0x250
> > >   unbind_store+0xaf/0x180
> > >   kernfs_fop_write+0x104/0x190
> > > 
> > > I've stumbled over this because some new patches by Ram connect the
> > > snd-hda-intel unload (where we do use sysfs unbind) with the locking
> > > chains in the i915 unload code (but without creating a new loop),
> > > which upset our CI. But the bug is already there and can be easily
> > > reproduced by unbind i915 directly.
> > 
> > This is odd, why wouldn't any driver hit this issue?  And why now since
> > you say this is triggerable today?
> 
> The above backtrace is triggered by unbinding i915 on current upstream
> kernels. Note: Will crash later on rather badly in the
> fbdev/fbcon/vtconsole hell, but that's separate issue (which can be worked
> around by first unbinding fbcon manually through sysfs).
> 
> > I know scsi was doing some strange things like trying to remove the
> > device itself from a sysfs callback on the device, which requires it to
> > just call a different kobject function created just for that type of
> > thing.  Would that also make sense to do here instead of your workqueue?
> 
> Note how we blow up on unregistering sw device instances supported by i915
> in entirely different subsystems. I guess most drivers just have sysfs
> files for their own stuff, where this is done as you describe. The problem
> is that there's an awful lot of unrelated stuff hanging off i915.
> 
> Or maybe acpi_video is busted, and should be using a different function.
> You haven't said which one, and I have no idea which one it is ...
> 
> And in case the context wasn't clear: This is unbinding the i915 pci
> driver which triggers the above lockdep splat recursion.
btw another option for "fixing" this would be to annotate the mutex_lock
in kernfs_remove_by_name_ns as recursive. Which just shuts up lockdep (and
might hide some real bugs), but would get the job done since there's not
actually a deadlock here. Just lockdep being annoyed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Powered by blists - more mailing lists
 
