[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YGtDzH0dEfEngCij@google.com>
Date: Mon, 5 Apr 2021 10:07:24 -0700
From: Minchan Kim <minchan@...nel.org>
To: Luis Chamberlain <mcgrof@...nel.org>
Cc: keescook@...omium.org, dhowells@...hat.com, hch@...radead.org,
mbenes@...e.com, gregkh@...uxfoundation.org, ngupta@...are.org,
sergey.senozhatsky.work@...il.com, axboe@...nel.dk,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug
multistate
On Thu, Apr 01, 2021 at 11:59:25PM +0000, Luis Chamberlain wrote:
> On Mon, Mar 22, 2021 at 03:12:01PM -0700, Minchan Kim wrote:
> > On Mon, Mar 22, 2021 at 08:41:56PM +0000, Luis Chamberlain wrote:
> > >
> > > I would not call it *every* syfs knob as not all deal with things which
> > > are related to CPU hotplug multistate, right? Note that using just
> > > try_module_get() alone (that is the second patch only, does not fix the
> > > race I am describing above).
> >
> > It wouldn't be CPU hotplug multistate issue but I'd like to call it
> > as more "zram instance race" bug.
> > What happens in this case?
> >
> > CPU 1 CPU 2
> >
> > destroy_devices
> > ..
> > compact_store()
> > zram = dev_to_zram(dev);
> > idr_for_each(zram_remove_cb
> > zram_remove
> > ..
> > kfree(zram)
> > down_read(&zram->init_lock);
> >
> >
> > CPU 1 CPU 2
> >
> > hot_remove_store
> > compact_store()
> > zram = dev_to_zram(dev);
> > zram_remove
> > kfree(zram)
> > down_read(&zram->init_lock);
> >
> > So, for me we need to close the zram instance create/removal
> > with sysfs rather than focusing on CPU hotplug issue.
>
> Sure, that's a good point.
>
> The issue which I noted for the race which ends up in a deadlock is only
> possible if a shared lock is used on removal but also on sysfs knobs.
>
> At first glance, the issue you describe above *seems* to be just proper
> care driver developers must take with structures used. It is certainly
> yet another issue we need to address, and if we can generalize a
> solution even better. I now recall I *think* I spotted that race a while
> ago and mentioned it to Kees and David Howells but I didn't have a
> solution for it yet. More on this below.
>
> The issue you point out is real, however you cannot disregard the
> CPU hoplug possible race as well, it is a design consideration which
> the CPU hotplug multistate support warns for -- consider driver removal.
> I agree that perhaps solving this "zram instance race" can fix he
> hotplug race as well. If we can solves all 3 issues in one shot even
> better. But let's evaluate that prospect...
>
> > Maybe, we could reuse zram_index_mutex with modifying it with
> > rw_semaphore. What do you think?
>
> Although ideal given it would knock 3 birds with 1 stone, it ends up
> actually making the sysfs attributes rather useless in light of the
> requirements for each of the races. Namely, the sysfs deadlock race
> *must* use a try lock approach, just as the try_module_get() case.
> It must use this approach so to immediately just bail out if we have
> our module being removed, and so on our __exit path. By trying to
> repurpose zram_index_mutex we end up then just doing too much with it
> and making the syfs attributes rather fragile for most uses:
>
> Consider disksize_show(), that would have to become:
>
> static ssize_t disksize_show(struct device *dev, struct device_attribute *attr, char *buf)
> {
> struct zram *zram = dev_to_zram(dev);
> + size_t disksize;
>
> + down_read(&zram_index_rwlock);
> + disksize = zram->disksize;
> + up_read(&zram_index_rwlock);
> - return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
> + return scnprintf(buf, PAGE_SIZE, "%llu\n", disksize);
> }
>
> What's wrong with this?
>
> It can block during a write, yes, but there is a type of write which
> will make this crash after the read lock is acquired. When the instance
> is removed. What if we try down_read_trylock()?
>
> static ssize_t disksize_show(struct device *dev, struct device_attribute *attr, char *buf)
> {
> struct zram *zram = dev_to_zram(dev);
> + size_t disksize;
>
> + if (!down_read_trylock(&zram_index_rwlock))
> + return -ENODEV;
> +
> + disksize = zram->disksize;
> + up_read(&zram_index_rwlock);
> - return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
> + return scnprintf(buf, PAGE_SIZE, "%llu\n", disksize);
> }
>
> What's wrong with this?
>
> If it got the lock, it should be OK as it is preventing writes from
> taking the lock for a bit. But then this just becomes pretty fragile,
> it will fail whenever another read or write is happening, triggering
> perhaps quite a bit of regressions on tests.
>
> And if we use write_trylock() we end up with the same fragile nature
> of failing the read with ENODEV for any silly thing going on with the
> driver.
>
> And come to think of it the last patch I had sent with a new
> DECLARE_RWSEM(zram_unload) also has this same issue making most
> sysfs attributes rather fragile.
Thanks for looking the way. I agree the single zram_index_rwlock is
not the right approach to fix it. However, I still hope we find more
generic solution to fix them at once since I see it's zram instance
racing problem.
A approach I am considering is to make struct zram include kobject
and then make zram sysfs auto populated under the kobject. So, zram/sysfs
lifetime should be under the kobject. With it, sysfs race probem I
mentioned above should be gone. Furthermore, zram_remove should fail
if one of the alive zram objects is existing
(i.e., zram->kobject->refcount > 1) so module_exit will fail, too.
I see one of the problems is how I could make new zram object's
attribute group for zram knobs under /sys/block/zram0 since block
layer already made zram0 kobject via device_add_disk.
Powered by blists - more mailing lists