[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240410172837.GO3492@suse.cz>
Date: Wed, 10 Apr 2024 19:28:37 +0200
From: David Sterba <dsterba@...e.cz>
To: Jan Kara <jack@...e.cz>
Cc: Matthew Wilcox <willy@...radead.org>, Yu Kuai <yukuai1@...weicloud.com>,
	axboe@...nel.dk, roger.pau@...rix.com, colyli@...e.de,
	kent.overstreet@...il.com, joern@...ybastard.org,
	miquel.raynal@...tlin.com, richard@....at, vigneshr@...com,
	sth@...ux.ibm.com, hoeppner@...ux.ibm.com, hca@...ux.ibm.com,
	gor@...ux.ibm.com, agordeev@...ux.ibm.com, jejb@...ux.ibm.com,
	martin.petersen@...cle.com, clm@...com, josef@...icpanda.com,
	dsterba@...e.com, viro@...iv.linux.org.uk, brauner@...nel.org,
	nico@...xnic.net, xiang@...nel.org, chao@...nel.org, tytso@....edu,
	adilger.kernel@...ger.ca, jack@...e.com, konishi.ryusuke@...il.com,
	akpm@...ux-foundation.org, hare@...e.de, p.raghav@...sung.com,
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
	xen-devel@...ts.xenproject.org, linux-bcache@...r.kernel.org,
	linux-mtd@...ts.infradead.org, linux-s390@...r.kernel.org,
	linux-scsi@...r.kernel.org, linux-bcachefs@...r.kernel.org,
	linux-btrfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-erofs@...ts.ozlabs.org, linux-ext4@...r.kernel.org,
	linux-nilfs@...r.kernel.org, yukuai3@...wei.com,
	yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis
On Thu, Jan 04, 2024 at 12:49:58PM +0100, Jan Kara wrote:
> On Sat 23-12-23 17:31:55, Matthew Wilcox wrote:
> > On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> > > @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> > >  		 * Drop the page of the primary superblock, so later read will
> > >  		 * always read from the device.
> > >  		 */
> > > -		invalidate_inode_pages2_range(mapping,
> > > -				bytenr >> PAGE_SHIFT,
> > > +		invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> > >  				(bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> > >  	}
> > >  
> > > -	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> > > -	if (IS_ERR(page))
> > > -		return ERR_CAST(page);
> > > +	nofs_flag = memalloc_nofs_save();
> > > +	folio = bdev_read_folio(bdev, bytenr);
> > > +	memalloc_nofs_restore(nofs_flag);
> > 
> > This is the wrong way to use memalloc_nofs_save/restore.  They should be
> > used at the point that the filesystem takes/releases whatever lock is
> > also used during reclaim.  I don't know btrfs well enough to suggest
> > what lock is missing these annotations.
> 
> In principle I agree with you but in this particular case I agree the ask
> is just too big. I suspect it is one of btrfs btree locks or maybe
> chunk_mutex but I doubt even btrfs developers know and maybe it is just a
> cargo cult. And it is not like this would be the first occurence of this
> anti-pattern in btrfs - see e.g. device_list_add(), add_missing_dev(),
> btrfs_destroy_delalloc_inodes() (here the wrapping around
> invalidate_inode_pages2() looks really weird), and many others...
The pattern is intentional and a temporary solution before we could
implement the scoped NOFS. Functions calling allocations get converted
from GFP_NOFS to GFP_KERNEL but in case they're called from a context
that either holds big locks or can recursively enter the filesystem then
it's protected by the memalloc calls. This should not be surprising.
What may not be obvious is which locks or kmalloc calling functions it
could be, this depends on the analysis of the function call chain and
usually there's enough evidence why it's needed.
Powered by blists - more mailing lists
 
