[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210622170006.3c2jgi4aa4edrkax@garbanzo>
Date: Tue, 22 Jun 2021 10:00:06 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: minchan@...nel.org, jeyu@...nel.org, ngupta@...are.org,
sergey.senozhatsky.work@...il.com, axboe@...nel.dk,
mbenes@...e.com, jpoimboe@...hat.com, tglx@...utronix.de,
keescook@...omium.org, jikos@...nel.org, rostedt@...dmis.org,
peterz@...radead.org, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and
driver removal
On Tue, Jun 22, 2021 at 06:51:13PM +0200, Greg KH wrote:
> On Tue, Jun 22, 2021 at 09:40:27AM -0700, Luis Chamberlain wrote:
> > On Tue, Jun 22, 2021 at 06:27:52PM +0200, Greg KH wrote:
> > > On Tue, Jun 22, 2021 at 08:27:13AM -0700, Luis Chamberlain wrote:
> > > > On Tue, Jun 22, 2021 at 09:41:23AM +0200, Greg KH wrote:
> > > > > On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:
> > > > > > + ssize_t __ret; \
> > > > > > + if (!try_module_get(THIS_MODULE)) \
> > > > >
> > > > > try_module_get(THIS_MODULE) is always racy and probably does not do what
> > > > > you want it to do. You always want to get/put module references from
> > > > > code that is NOT the code calling these functions.
> > > >
> > > > In this case, we want it to trump module removal if it succeeds. That's all.
> > >
> > > True, but either you stop the race, or you do not right? If you are so
> > > invested in your load/unload test, this should show up with this code
> > > eventually as well.
> >
> > I still do not see how the race is possible give the goal to prevent
> > module removal if a sysfs file is being used. If rmmod is taking
> > place, this simply will bail out.
> >
> > > > > > + return -ENODEV; \
> > > > > > + __ret = _name ## _store(dev, attr, buf, len); \
> > > > > > + module_put(THIS_MODULE); \
> > > > >
> > > > > This too is going to be racy.
> > > > >
> > > > > While fun to poke at, I still think this is pointless.
> > > >
> > > > If you have a better idea, which does not "DOS" module removal, please
> > > > let me know!
> > >
> > > I have yet to understand why you think that the load/unload in a loop is
> > > a valid use case.
> >
> > That is dependent upon the intrastructure tests built for a driver.
> >
> > In the case of fstests and blktests we have drivers which *always* get
> > removed and loaded on each test. Take for instance scsi_debug, which
> > creates / destroys virtual devices on the per test. Likewise, to build
> > confidence that failure rate is as close as possible to 0, one must run
> > a test as many times as possible in a loop. And, to build confidence in
> > a test, in some situations one ends up running modprobe / rmmod in a
> > loop.
> >
> > In this case a customer does have a complex system of tests, and by looking
> > at the crash logs I managed to simplify the way to reproduce it using
> > simple shell scripts.
>
> And is _this_ change needed even with the changes in patch 1/3?
Oh absolutely. This patch is needed 100%. Without it, it is actually
pretty trivial to deadlock as noted in my instructions on how to
reproduce.
> I think that commit fixes your issues given that you will not unload the
> module until after the sysfs devices are removed from the system. Have
> you tried that alone with your test?
I have tried that, and it does not resolve the deadlock.
It was *why* I have been insisting that this is a real issue, and why I
decided to instead try to implement something generic after I was hinted
by livepatch folks that they also had observed a similar deadlock, and
so that a generic solution would be appreciated by them.
Luis
Powered by blists - more mailing lists