linux-kernel - Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YWjJ0O7K+31Iz3ox@bombadil.infradead.org>
Date:   Thu, 14 Oct 2021 17:22:40 -0700
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     tj@...nel.org, gregkh@...uxfoundation.org,
        akpm@...ux-foundation.org, minchan@...nel.org, jeyu@...nel.org,
        shuah@...nel.org, bvanassche@....org, dan.j.williams@...el.com,
        joe@...ches.com, tglx@...utronix.de, keescook@...omium.org,
        rostedt@...dmis.org, linux-spdx@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-block@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate

On Fri, Oct 15, 2021 at 07:52:04AM +0800, Ming Lei wrote:
> On Thu, Oct 14, 2021 at 01:24:32PM -0700, Luis Chamberlain wrote:
> > On Thu, Oct 14, 2021 at 10:11:46AM +0800, Ming Lei wrote:
> > > On Thu, Oct 14, 2021 at 09:55:48AM +0800, Ming Lei wrote:
> > > > On Mon, Sep 27, 2021 at 09:38:04AM -0700, Luis Chamberlain wrote:
> > > 
> > > ...
> > > 
> > > > 
> > > > Hello Luis,
> > > > 
> > > > Can you test the following patch and see if the issue can be addressed?
> > > > 
> > > > Please see the idea from the inline comment.
> > > > 
> > > > Also zram_index_mutex isn't needed in zram disk's store() compared with
> > > > your patch, then the deadlock issue you are addressing in this series can
> > > > be avoided.
> > > > 
> > > > 
> > > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > > > index fcaf2750f68f..3c17927d23a7 100644
> > > > --- a/drivers/block/zram/zram_drv.c
> > > > +++ b/drivers/block/zram/zram_drv.c
> > > > @@ -1985,11 +1985,17 @@ static int zram_remove(struct zram *zram)
> > > >  
> > > >  	/* Make sure all the pending I/O are finished */
> > > >  	fsync_bdev(bdev);
> > > > -	zram_reset_device(zram);
> > > >  
> > > >  	pr_info("Removed device: %s\n", zram->disk->disk_name);
> > > >  
> > > >  	del_gendisk(zram->disk);
> > > > +
> > > > +	/*
> > > > +	 * reset device after gendisk is removed, so any change from sysfs
> > > > +	 * store won't come in, then we can really reset device here
> > > > +	 */
> > > > +	zram_reset_device(zram);
> > > > +
> > > >  	blk_cleanup_disk(zram->disk);
> > > >  	kfree(zram);
> > > >  	return 0;
> > > > @@ -2073,7 +2079,12 @@ static int zram_remove_cb(int id, void *ptr, void *data)
> > > >  static void destroy_devices(void)
> > > >  {
> > > >  	class_unregister(&zram_control_class);
> > > > +
> > > > +	/* hold the global lock so new device can't be added */
> > > > +	mutex_lock(&zram_index_mutex);
> > > >  	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
> > > > +	mutex_unlock(&zram_index_mutex);
> > > > +
> > > 
> > > Actually zram_index_mutex isn't needed when calling zram_remove_cb()
> > > since the zram-control sysfs interface has been removed, so userspace
> > > can't add new device any more, then the issue is supposed to be fixed
> > > by the following one line change, please test it:
> > > 
> > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > > index fcaf2750f68f..96dd641de233 100644
> > > --- a/drivers/block/zram/zram_drv.c
> > > +++ b/drivers/block/zram/zram_drv.c
> > > @@ -1985,11 +1985,17 @@ static int zram_remove(struct zram *zram)
> > >  
> > >  	/* Make sure all the pending I/O are finished */
> > >  	fsync_bdev(bdev);
> > > -	zram_reset_device(zram);
> > >  
> > >  	pr_info("Removed device: %s\n", zram->disk->disk_name);
> > >  
> > >  	del_gendisk(zram->disk);
> > > +
> > > +	/*
> > > +	 * reset device after gendisk is removed, so any change from sysfs
> > > +	 * store won't come in, then we can really reset device here
> > > +	 */
> > > +	zram_reset_device(zram);
> > > +
> > >  	blk_cleanup_disk(zram->disk);
> > >  	kfree(zram);
> > >  	return 0;
> > 
> > Sorry but nope, the cpu multistate issue is still present and we end up
> > eventually with page faults. I tried with both patches.
> 
> In theory disksize_store() can't come in after del_gendisk() returns,
> then zram_reset_device() should cleanup everything, that is the issue
> you described in commit log.
> 
> We need to understand the exact reason why there is still cpuhp node
> left, can you share us the exact steps for reproducing the issue?
> Otherwise we may have to trace and narrow down the reason.

See my commit log for my own fix for this issue.

  Luis