[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aafabc6e-fd98-f927-44d7-3ef76e2acaf8@huaweicloud.com>
Date: Wed, 6 Dec 2023 14:50:56 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Christoph Hellwig <hch@...radead.org>,
Yu Kuai <yukuai1@...weicloud.com>
Cc: axboe@...nel.dk, roger.pau@...rix.com, colyli@...e.de,
kent.overstreet@...il.com, joern@...ybastard.org,
miquel.raynal@...tlin.com, richard@....at, vigneshr@...com,
sth@...ux.ibm.com, hoeppner@...ux.ibm.com, hca@...ux.ibm.com,
gor@...ux.ibm.com, agordeev@...ux.ibm.com, jejb@...ux.ibm.com,
martin.petersen@...cle.com, clm@...com, josef@...icpanda.com,
dsterba@...e.com, nico@...xnic.net, xiang@...nel.org,
chao@...nel.org, tytso@....edu, adilger.kernel@...ger.ca,
agruenba@...hat.com, jack@...e.com, konishi.ryusuke@...il.com,
willy@...radead.org, akpm@...ux-foundation.org, hare@...e.de,
p.raghav@...sung.com, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, xen-devel@...ts.xenproject.org,
linux-bcache@...r.kernel.org, linux-mtd@...ts.infradead.org,
linux-s390@...r.kernel.org, linux-scsi@...r.kernel.org,
linux-bcachefs@...r.kernel.org, linux-btrfs@...r.kernel.org,
linux-erofs@...ts.ozlabs.org, linux-ext4@...r.kernel.org,
gfs2@...ts.linux.dev, linux-nilfs@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH -next RFC 01/14] block: add some bdev apis
Hi,
在 2023/12/06 14:14, Christoph Hellwig 写道:
>> +void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
>> + pgoff_t end)
>> +{
>> + invalidate_mapping_pages(bdev->bd_inode->i_mapping, start, end);
>> +}
>> +EXPORT_SYMBOL_GPL(invalidate_bdev_range);
>
> All these could probably use kerneldoc comments.
Ok, and thanks for reviewing the patchset!
>
> For this one I really don't like it existing at all, but we'll have to
> discuss that in the btrfs patch.
>
>> +loff_t bdev_size(struct block_device *bdev)
>> +{
>> + loff_t size;
>> +
>> + spin_lock(&bdev->bd_size_lock);
>> + size = i_size_read(bdev->bd_inode);
>> + spin_unlock(&bdev->bd_size_lock);
>> +
>> + return size;
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_size);
>
> No need for this one. The callers can simply use bdev_nr_bytes.
Ok, I'll replace it with bdev_nr_bytes.
>
>> +struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index)
>> +{
>> + return read_mapping_folio(bdev->bd_inode->i_mapping, index, NULL);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_read_folio);
>> +
>> +struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
>> + gfp_t gfp)
>> +{
>> + return mapping_read_folio_gfp(bdev->bd_inode->i_mapping, index, gfp);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_read_folio_gfp);
>
> I think we can just drop bdev_read_folio_gfp. Half of the callers simply
> pass GPK_KERNEL, and the other half passes GFP_NOFS and could just use
> memalloc_nofs_save().
I'm a litter confused, so there are 3 use cases:
1) use GFP_USER, default gfp from bdev_alloc.
2) use GFP_KERNEL
3) use GFP_NOFS
I understand that you're suggesting memalloc_nofs_save() to distinguish
2 and 3, but how can I distinguish 1?
>
>> +void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev)
>> +{
>> + return balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_balance_dirty_pages_ratelimited);
>
> Hmm, this is just used for block2mtd, and feels a little too low-level
> to me, as block2mtd really should be using the normal fileread/write
> APIs. I guess we'll have to live with it for now if we want to expedite
> killing off bd_inode.
>
>> +void bdev_correlate_mapping(struct block_device *bdev,
>> + struct address_space *mapping)
>> +{
>> + mapping->host = bdev->bd_inode;
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_correlate_mapping);
>
> Maybe associated insted of correlate? Either way this basically
> fully exposes the bdev inode again :(
>
>> +gfp_t bdev_gfp_constraint(struct block_device *bdev, gfp_t gfp)
>> +{
>> + return mapping_gfp_constraint(bdev->bd_inode->i_mapping, gfp);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_gfp_constraint);
>
> The right fix here is to:
>
> - use memalloc_nofs_save in extet instead of using
> mapping_gfp_constraint to clear it from the mapping flags
> - remove __ext4_sb_bread_gfp and just have buffer.c helper that does
> the right thing (either by changing the calling conventions of an
> existing one, or adding a new one).
Thanks for the suggestions, but I'm not sure how to do this yet, I must
read more ext4 code.
>
>> +/*
>> + * The del_gendisk() function uninitializes the disk-specific data
>> + * structures, including the bdi structure, without telling anyone
>> + * else. Once this happens, any attempt to call mark_buffer_dirty()
>> + * (for example, by ext4_commit_super), will cause a kernel OOPS.
>> + * This is a kludge to prevent these oops until we can put in a proper
>> + * hook in del_gendisk() to inform the VFS and file system layers.
>> + */
>> +int bdev_ejected(struct block_device *bdev)
>> +{
>> + struct backing_dev_info *bdi = inode_to_bdi(bdev->bd_inode);
>> +
>> + return bdi->dev == NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_ejected);
>
> And this code in ext4 should just go away entirely. The bdi should
> always be valid for a live bdev for years.
Sounds good, I was confused about this code as well.
>
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -1119,6 +1119,7 @@ void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
>> WARN_ON_ONCE(off > UINT_MAX);
>> __bio_add_page(bio, &folio->page, len, off);
>> }
>> +EXPORT_SYMBOL_GPL(bio_add_folio_nofail);
>
> How is this realted? The export is fine, but really should be a
> separate, well-documented commit.
This is used to replace __bio_add_page() in btrfs while converting page
to folio, please let me know if I should keep this, if so, I'll split
this into a new commit.
>
>>
>> +static inline u8 block_bits(struct block_device *bdev)
>> +{
>> + return bdev->bd_inode->i_blkbits;
>> +}
>
> Not sure we should need this. i_blkbits comes from the blocksize
> the fs set, so it should have other ways to get at it.
Yes, this is now only used for erofs, and erofs do call
sb_set_blocksize() while initializing, hence it's right there is other
way to get blkbits and this helper is not needed.
Thanks,
Kuai
> .
>
Powered by blists - more mailing lists