lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210622170006.3c2jgi4aa4edrkax@garbanzo>
Date:   Tue, 22 Jun 2021 10:00:06 -0700
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     minchan@...nel.org, jeyu@...nel.org, ngupta@...are.org,
        sergey.senozhatsky.work@...il.com, axboe@...nel.dk,
        mbenes@...e.com, jpoimboe@...hat.com, tglx@...utronix.de,
        keescook@...omium.org, jikos@...nel.org, rostedt@...dmis.org,
        peterz@...radead.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and
 driver removal

On Tue, Jun 22, 2021 at 06:51:13PM +0200, Greg KH wrote:
> On Tue, Jun 22, 2021 at 09:40:27AM -0700, Luis Chamberlain wrote:
> > On Tue, Jun 22, 2021 at 06:27:52PM +0200, Greg KH wrote:
> > > On Tue, Jun 22, 2021 at 08:27:13AM -0700, Luis Chamberlain wrote:
> > > > On Tue, Jun 22, 2021 at 09:41:23AM +0200, Greg KH wrote:
> > > > > On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:
> > > > > > +	ssize_t __ret; \
> > > > > > +	if (!try_module_get(THIS_MODULE)) \
> > > > > 
> > > > > try_module_get(THIS_MODULE) is always racy and probably does not do what
> > > > > you want it to do.  You always want to get/put module references from
> > > > > code that is NOT the code calling these functions.
> > > > 
> > > > In this case, we want it to trump module removal if it succeeds. That's all.
> > > 
> > > True, but either you stop the race, or you do not right?  If you are so
> > > invested in your load/unload test, this should show up with this code
> > > eventually as well.
> > 
> > I still do not see how the race is possible give the goal to prevent
> > module removal if a sysfs file is being used. If rmmod is taking
> > place, this simply will bail out.
> > 
> > > > > > +		return -ENODEV; \
> > > > > > +	__ret = _name ## _store(dev, attr, buf, len); \
> > > > > > +	module_put(THIS_MODULE); \
> > > > > 
> > > > > This too is going to be racy.
> > > > > 
> > > > > While fun to poke at, I still think this is pointless.
> > > > 
> > > > If you have a better idea, which does not "DOS" module removal, please
> > > > let me know!
> > > 
> > > I have yet to understand why you think that the load/unload in a loop is
> > > a valid use case.
> > 
> > That is dependent upon the intrastructure tests built for a driver.
> > 
> > In the case of fstests and blktests we have drivers which *always* get
> > removed and loaded on each test. Take for instance scsi_debug, which
> > creates / destroys virtual devices on the per test. Likewise, to build
> > confidence that failure rate is as close as possible to 0, one must run
> > a test as many times as possible in a loop. And, to build confidence in
> > a test, in some situations one ends up running modprobe / rmmod in a
> > loop.
> > 
> > In this case a customer does have a complex system of tests, and by looking
> > at the crash logs I managed to simplify the way to reproduce it using
> > simple shell scripts.
> 
> And is _this_ change needed even with the changes in patch 1/3?

Oh absolutely. This patch is needed 100%. Without it, it is actually
pretty trivial to deadlock as noted in my instructions on how to
reproduce.

> I think that commit fixes your issues given that you will not unload the
> module until after the sysfs devices are removed from the system.  Have
> you tried that alone with your test?

I have tried that, and it does not resolve the deadlock.

It was *why* I have been insisting that this is a real issue, and why I
decided to instead try to implement something generic after I was hinted
by livepatch folks that they also had observed a similar deadlock, and
so that a generic solution would be appreciated by them.

  Luis

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ