[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAB=NE6WjupsJFwsj94sC_j3gcYn2Qo0sx1=tMv=WUZ83jq_DFw@mail.gmail.com>
Date: Tue, 21 Sep 2021 08:48:49 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: David Laight <David.Laight@...lab.com>
Cc: "tj@...nel.org" <tj@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"minchan@...nel.org" <minchan@...nel.org>,
"jeyu@...nel.org" <jeyu@...nel.org>,
"shuah@...nel.org" <shuah@...nel.org>,
"rdunlap@...radead.org" <rdunlap@...radead.org>,
"rafael@...nel.org" <rafael@...nel.org>,
"masahiroy@...nel.org" <masahiroy@...nel.org>,
"ndesaulniers@...gle.com" <ndesaulniers@...gle.com>,
"yzaikin@...gle.com" <yzaikin@...gle.com>,
"nathan@...nel.org" <nathan@...nel.org>,
"ojeda@...nel.org" <ojeda@...nel.org>,
"vitor@...saru.org" <vitor@...saru.org>,
"elver@...gle.com" <elver@...gle.com>,
"jarkko@...nel.org" <jarkko@...nel.org>,
"glider@...gle.com" <glider@...gle.com>,
"rf@...nsource.cirrus.com" <rf@...nsource.cirrus.com>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"bvanassche@....org" <bvanassche@....org>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"andriy.shevchenko@...ux.intel.com"
<andriy.shevchenko@...ux.intel.com>,
"trishalfonso@...gle.com" <trishalfonso@...gle.com>,
"andreyknvl@...il.com" <andreyknvl@...il.com>,
"jikos@...nel.org" <jikos@...nel.org>,
"mbenes@...e.com" <mbenes@...e.com>,
"ngupta@...are.org" <ngupta@...are.org>,
"sergey.senozhatsky.work@...il.com"
<sergey.senozhatsky.work@...il.com>,
"reinette.chatre@...el.com" <reinette.chatre@...el.com>,
"fenghua.yu@...el.com" <fenghua.yu@...el.com>,
"bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>,
"hpa@...or.com" <hpa@...or.com>,
"lizefan.x@...edance.com" <lizefan.x@...edance.com>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"daniel.vetter@...ll.ch" <daniel.vetter@...ll.ch>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"kw@...ux.com" <kw@...ux.com>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
"hch@....de" <hch@....de>, "joe@...ches.com" <joe@...ches.com>,
"hkallweit1@...il.com" <hkallweit1@...il.com>,
"axboe@...nel.dk" <axboe@...nel.dk>,
"jpoimboe@...hat.com" <jpoimboe@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"keescook@...omium.org" <keescook@...omium.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"linux-spdx@...r.kernel.org" <linux-spdx@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"copyleft-next@...ts.fedorahosted.org"
<copyleft-next@...ts.fedorahosted.org>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Subject: Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal
On Tue, Sep 21, 2021 at 1:24 AM David Laight <David.Laight@...lab.com> wrote:
>
> From: Luis Chamberlain
> > Sent: 17 September 2021 20:47
> >
> > When sysfs attributes use a lock also used on module removal we can
> > race to deadlock. This happens when for instance a sysfs file on
> > a driver is used, then at the same time we have module removal call
> > trigger. The module removal call code holds a lock, and then the sysfs
> > file entry waits for the same lock. While holding the lock the module
> > removal tries to remove the sysfs entries, but these cannot be removed
> > yet as one is waiting for a lock. This won't complete as the lock is
> > already held. Likewise module removal cannot complete, and so we deadlock.
>
> Isn't the real problem the race between a sysfs file action and the
> removal of the sysfs node?
Nope, that is taken care of by kernfs.
> This isn't really related to module unload - except that may
> well remove some sysfs nodes.
Nope, the issue is a deadlock that can happen due to a shared lock on
module removal and a driver sysfs operation.
> This is the same problem as removing any other kind of driver callback.
> There are three basic solutions:
> 1) Use a global lock - not usually useful.
> 2) Have the remove call sleep until any callbacks are complete.
> 3) Have the remove just request removal and have a final
> callback (from a different context).
Kernfs already does a sort of combination of 1) and 2) but 1) is using
atomic reference counts.
> If the remove can sleep (as in 2) then there is a requirement
> on the driver code to not hold any locks across the 'remove'
> that can be acquired during the callbacks.
And this is the part that kernfs has no control over since the removal
and sysfs operation are implementation specific.
> Now, for sysfs, you probably only want to sleep the remove code
> while a read/write is in progress - not just because the node
> is open.
> That probably requires marking an open node 'invalid' and
> deferring delete to close.
This is already done by kernfs.
> None of this requires a reference count on the module.
You are missing the point to the other aspect of the try_module_get(),
it lets you also check if module exit has been entered. By using
try_module_get() you let the module exit trump proceeding with an
operation, therefore also preventing any potential use of a shared
lock on module exit and the driver specific sysfs operation.
Luis
Powered by blists - more mailing lists