[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4jFaNTfpKq6hQsFrWXTwHz-wG+5K4m-hmsPeteLX4p5AQ@mail.gmail.com>
Date: Mon, 20 Sep 2021 14:55:10 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Luis Chamberlain <mcgrof@...nel.org>
Cc: Tejun Heo <tj@...nel.org>, Greg KH <gregkh@...uxfoundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Minchan Kim <minchan@...nel.org>, jeyu@...nel.org,
shuah <shuah@...nel.org>, Randy Dunlap <rdunlap@...radead.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Masahiro Yamada <masahiroy@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>, yzaikin@...gle.com,
Nathan Chancellor <nathan@...nel.org>, ojeda@...nel.org,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
vitor@...saru.org, elver@...gle.com,
Jarkko Sakkinen <jarkko@...nel.org>,
Alexander Potapenko <glider@...gle.com>,
rf@...nsource.cirrus.com,
Stephen Hemminger <stephen@...workplumber.org>,
David Laight <David.Laight@...lab.com>, bvanassche@....org,
jolsa@...nel.org,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
trishalfonso@...gle.com, andreyknvl@...il.com,
Jiri Kosina <jikos@...nel.org>, mbenes@...e.com,
Nitin Gupta <ngupta@...are.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>, lizefan.x@...edance.com,
Johannes Weiner <hannes@...xchg.org>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Krzysztof WilczyĆski <kw@...ux.com>,
senozhatsky@...omium.org, Christoph Hellwig <hch@....de>,
Joe Perches <joe@...ches.com>, hkallweit1@...il.com,
Jens Axboe <axboe@...nel.dk>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Kees Cook <keescook@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
linux-spdx@...r.kernel.org,
Linux Doc Mailing List <linux-doc@...r.kernel.org>,
linux-block@...r.kernel.org,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-kselftest@...r.kernel.org, cgroups@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
copyleft-next@...ts.fedorahosted.org
Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal
On Mon, Sep 20, 2021 at 2:17 PM Luis Chamberlain <mcgrof@...nel.org> wrote:
>
> On Mon, Sep 20, 2021 at 01:52:21PM -0700, Dan Williams wrote:
> > On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain <mcgrof@...nel.org> wrote:
> > > This deadlock was first reported with the zram driver, however the live
> > > patching folks have acknowledged they have observed this as well with
> > > live patching, when a live patch is removed. I was then able to
> > > reproduce easily by creating a dedicated selftests.
> > >
> > > A sketch of how this can happen follows:
> > >
> > > CPU A CPU B
> > > whatever_store()
> > > module_unload
> > > mutex_lock(foo)
> > > mutex_lock(foo)
> > > del_gendisk(zram->disk);
> > > device_del()
> > > device_remove_groups()
> >
> > This flow seems possible to trigger with:
> >
> > echo $dev > /sys/bus/$bus/drivers/$driver/unbind
> >
> > I am missing why module pinning
>
> The aspect of try_module_get() which comes to value to prevent the
> deadlock is it ensures kernfs ops do not run once exit is on the way.
>
> > is part of the solution when it's the
> > device_del() path that is racing?
>
> But its not, the device_del() path will yield until the kernfs op
> completes. It is fine to wait there.
>
> The deadlock happens if a module exit routine uses a lock which is
> also used on a sysfs op. If the lock was first held by module exit,
> and module exit is waiting for the kernfs op to complete, and the
> kernfs op is waiting to hold the same lock then the exit will wait
> forever.
>
> > Module removal is just a more coarse
> > grained way to trigger unbind => device_del().
>
> Right, but the device_del() path is not sharing a lock with the sysfs op.
The deadlock in the example comes from holding a lock over
device_del() that is also taken in a kernfs op. For example, the code
above looks like something that runs from driver.remove(), not
exclusively module_exit(). Yes, module_exit() may trigger
driver.remove() via driver_unregister(), but there's other ways to
trigger driver.remove() that do not involve module_exit().
> > Isn't the above a bug
> > in the driver, not missing synchronization in kernfs?
>
> We can certainly take the position as an alternative:
>
> "thou shalt not use a lock on exit which is also used on a syfs op"
>
> However that seems counter intuitive, specially if we can resolve the
> issue easily with a try_module_get().
Again, I don't see how try_module_get() can affect the ABBA failure
case of holding a lock over device_del() that is also held inside
sysfs op. I agree that the problem is subtle. Does lockdep not
complain about this case? If it's going to be avoided in the core it
seems try_module_get() does not completely cover the hole that
unsuspecting driver writers might fall into.
Powered by blists - more mailing lists