[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <115c3b08-aff1-dd97-fe6a-7901452ce62c@huaweicloud.com>
Date: Wed, 9 Apr 2025 17:27:11 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Christoph Hellwig <hch@....de>, Yu Kuai <yukuai1@...weicloud.com>
Cc: xni@...hat.com, colyli@...nel.org, axboe@...nel.dk, agk@...hat.com,
snitzer@...nel.org, mpatocka@...hat.com, song@...nel.org,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
dm-devel@...ts.linux.dev, linux-raid@...r.kernel.org, yi.zhang@...wei.com,
yangerkun@...wei.com, kbusch@...nel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RFC v2 00/14] md: introduce a new lockless bitmap
Hi,
在 2025/04/09 16:32, Christoph Hellwig 写道:
> On Sat, Mar 29, 2025 at 09:11:13AM +0800, Yu Kuai wrote:
>> The purpose here is to hide the low level bitmap IO implementation to
>> the API disk->submit_bio(), and the bitmap IO can be converted to buffer
>> IO to the bdev_file. This is the easiest way that I can think of to
>> resue the pagecache, with natural ability for dirty page writeback. I do
>> think about creating a new anon file and implement a new
>> file_operations, this will be much more complicated.
>
> I've started looking at this a bit now, sorry for the delay.
>
> As far as I can see you use the bitmap file just so that you have your
> own struct address_space and thus page cache instance and then call
> read_mapping_page and filemap_write_and_wait_range on it right?
Yes.
>
> For that you'd be much better of just creating your own trivial
> file_system_type with an inode fully controlled by your driver
> that has a trivial set of address_space ops instead of oddly
> mixing with the block layer.
Yes, this is exactly what I said implement a new file_operations(and
address_space ops), I wanted do this the easy way, just reuse the raw
block device ops, this way I just need to implement the submit_bio ops
for new hidden disk.
I can try with new fs type if we really think this solution is too
hacky, however, the code line will be much more. :(
>
> Note that either way I'm not sure using the page cache here is an
> all that good idea, as we're at the bottom of the I/O stack and
> thus memory allocations can very easily deadlock.
Yes, for the page from bitmap, this set do the easy way just read and
ping all realted pages while loading the bitmap. For two reasons:
1) We don't need to allocate and read pages from IO path;(In the first
RFC version, I'm using a worker to do that).
2) In the first RFC version, I find and get page in the IO path, turns
out page reference is an *atomic*, and the overhead is not acceptable;
And the only action from IO path is that if bitmap page is dirty,
filemap_write_and_wait_range() is called from async worker, the same as
old bitmap, to flush bitmap dirty pages.
>
> What speaks against using your own folios explicitly allocated at
> probe time and then just doing manual submit_bio on that? That's
> probably not much more code but a lot more robust.
I'm not quite sure if I understand you correctly. Do you means don't use
pagecache for bitmap IO, and manually create BIOs like the old bitmap,
meanwhile invent a new solution for synchronism instead of the global
spin_lock from old bitmap?
Thanks,
Kuai
>
> Also a high level note: the bitmap_operations aren't a very nice
> interface. A lot of methods are empty and should just be called
> conditionally. Or even better you'd do away with the expensive
> indirect calls and just directly call either the old or new
> bitmap code.
>
>> Meanwhile, bitmap file for the old bitmap will be removed sooner or
>> later, and this bdev_file implementation will compatible with bitmap
>> file as well.
>
> Which would also mean that at that point the operations vector would
> be pointless, so we might as well not add it to start with.
>
> .
>
Powered by blists - more mailing lists