linux-kernel - Re: [PATCH V2 2/4] zram: don't fail to remove zram during unloading module

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YXIIBPMdA547wy1b@T590>
Date:   Fri, 22 Oct 2021 08:38:28 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Luis Chamberlain <mcgrof@...nel.org>
Cc:     Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
        Minchan Kim <minchan@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org, ming.lei@...hat.com
Subject: Re: [PATCH V2 2/4] zram: don't fail to remove zram during unloading
 module

On Thu, Oct 21, 2021 at 04:50:55PM -0700, Luis Chamberlain wrote:
> On Wed, Oct 20, 2021 at 09:55:46AM +0800, Ming Lei wrote:
> > When zram module is started to be unloaded, no one should use all zram
> > disks at that time. But disk's show() or store() attribute method may be
> > running, this way is expected because device_del()(called from
> > del_gendisk) will drain all pending show()/store().
> 
> How about:
> 
> ----------
> When the zram module is being unloaded, no one should be using the
> zram disks. However even while being unloaded the zram module's
> sysfs attributes might be poked at to re-configure the zram module.

're-configure the zram module' is confused, and it should be zram
device.

> This is expected, and kernfs ensures that these operations complete
> before device_del() completes.

Otherwise, the above is pretty good, and I will take that in V3.

> ----------
> 
> > But reset_store() may set ->claim which will fail zram_remove(), when
> > this happens, the zram device will be leaked and the warning of 'Error:
> > Removing state 63 which has instances left.' is triggered during
> > unloading module.
> 
> Note: the "Removing state 63 which has instances left" does not
> necessarily mean the struct zram is leaked. It just means that the per
> cpu struct zcomp instances are leaked, given the CPU hotplug removal
> callback is in charge of cleaning them up. That this gave us a hint that
> we do in fact leak a struct zram device as well is separate from what
> the message means, but I do agree it should be *that* because as you
> noted, LTP does not yet try to make spaghetti with hot_add_show().
> 
> So, how about:
> 
> ----------
> When the reset sysfs op (reset_store()) gets called the zram->claim will
> be set, and this prevents zram_remove() from removing a zram device. If one
> is unloading the module and one races to run the reset sysfs op we end
> up leaking the struct zram device. We learned about this issue through
> the error "Error: Removing state 63 which has instances left". While
> this just means the any of the per cpu struct zcomp instances are
> leaked when we're removing the CPU hotplug multistate callback, the
> test case (running LTP zram02.sh twice in different terminals) that
> allows us to reproduce this issue only mucks with reseting the devices,
> not hot_add_show(). Setting zram->claim will prevent zram_remove() from
> completing successfully, and so on module removal zram_remove_cb() does
> not tell us when it failed to remove the full struct zram device and
> it leaks.
> ----------

commit log is for helping people to understand the change, and too long or
exposing too much details may not serve that purpose since we also talk with
code, and I believe the following words are clear enough for explaining the
problem, and it is short & straightforward, and won't make people terrified, :-)

	But reset_store() may set ->claim which will fail zram_remove(), when
	this happens, zram_reset_device() is bypassed, and zram->comp can't
	be destroyed, so the warning of 'Error: Removing state 63 which has
	instances left.' is triggered during unloading module.

> 
> This begs the question, should we not then also make zram_remove_cb()
> chatty on failure?

When zram_remove() is run from module unloading path, it shouldn't be
failed cause no one owns any zram disk since unloading module, that is
why I add WARN_ON_ONCE() in zram_remove_cb() for document benefit because
zram_remove() is called from two contexts. But it can be failed in case of
hot removing.

> 
> > Fixes the issue by not failing zram_remove() if ->claim is set, and
> > we actually need to do nothing in case that zram_reset() is running
> > since del_gendisk() will wait until zram_reset() is done.
> 
> Sure this all looks like a reasonable alternative to the issue without
> using a generic lock. It does beg the questions if this suffices for
> the other oddball sysfs ops, but we can take that up with time.

->claim is only set in reset_store(), so it is enough for avoiding
zram_remove failure during unloading module.

> 
> > Reported-by: Luis Chamberlain <mcgrof@...nel.org>
> > Signed-off-by: Ming Lei <ming.lei@...hat.com>
> 
> In so far as the code is concerned:
> 
> Reviewed-by: Luis Chamberlain <mcgrof@...nel.org>

Thanks for the review!

-- 
Ming