linux-kernel - Re: [PATCH RFC v2 00/14] md: introduce a new lockless bitmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54ab4291-9152-44d1-bf6c-675b58cfcea8@huaweicloud.com>
Date: Sat, 19 Apr 2025 16:46:03 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Yu Kuai <yukuai1@...weicloud.com>, Christoph Hellwig <hch@....de>
Cc: xni@...hat.com, colyli@...nel.org, axboe@...nel.dk, agk@...hat.com,
 snitzer@...nel.org, mpatocka@...hat.com, song@...nel.org,
 linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 dm-devel@...ts.linux.dev, linux-raid@...r.kernel.org, yi.zhang@...wei.com,
 yangerkun@...wei.com, kbusch@...nel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RFC v2 00/14] md: introduce a new lockless bitmap

Hi, Christoph!

在 2025/04/11 9:36, Yu Kuai 写道:
> Hi,
> 
> 在 2025/04/09 17:40, Christoph Hellwig 写道:
>> On Wed, Apr 09, 2025 at 05:27:11PM +0800, Yu Kuai wrote:
>>>> For that you'd be much better of just creating your own trivial
>>>> file_system_type with an inode fully controlled by your driver
>>>> that has a trivial set of address_space ops instead of oddly
>>>> mixing with the block layer.
>>>
>>> Yes, this is exactly what I said implement a new file_operations(and
>>> address_space ops), I wanted do this the easy way, just reuse the raw
>>> block device ops, this way I just need to implement the submit_bio ops
>>> for new hidden disk.
>>>
>>> I can try with new fs type if we really think this solution is too
>>> hacky, however, the code line will be much more. :(
>>
>> I don't think it should be much more.  It'll also remove the rather
>> unexpected indirection through submit_bio.  Just make sure you use
>> iomap for your operations, and implement the submit_io hook.  That
>> will also be more efficient than the buffer_head based block ops
>> for writes.
>>
>>>>
>>>> Note that either way I'm not sure using the page cache here is an
>>>> all that good idea, as we're at the bottom of the I/O stack and
>>>> thus memory allocations can very easily deadlock.
>>>
>>> Yes, for the page from bitmap, this set do the easy way just read and
>>> ping all realted pages while loading the bitmap. For two reasons:
>>>
>>> 1) We don't need to allocate and read pages from IO path;(In the first
>>> RFC version, I'm using a worker to do that).
>>
>> You still depend on the worker, which will still deadlock.
>>
>>>> What speaks against using your own folios explicitly allocated at
>>>> probe time and then just doing manual submit_bio on that?  That's
>>>> probably not much more code but a lot more robust.
>>>
>>> I'm not quite sure if I understand you correctly. Do you means don't use
>>> pagecache for bitmap IO, and manually create BIOs like the old bitmap,
>>> meanwhile invent a new solution for synchronism instead of the global
>>> spin_lock from old bitmap?
>>
>> Yes.  Alternatively you need to pre-populate the page cache and keep
>> extra page references.
> 
> Ok, I'll think about self managed pages and IO path. Meanwhile, please
> let me know if you have questions with other parts.

So, today I implement a version, and I do admit this way is much
simpler, turns out total 200 less code lines. And can you check the
following untested code if you agree with the implementation? I'll
start to work a new version if you agree.

Thanks,
Kuai

static int llbitmap_rdev_page_io(struct md_rdev *rdev, struct page *page,
                                 ┊int idx, bool rw)
{
         struct bio bio;
         int ret;

         bio_init(&bio, rdev->bdev, bio.bi_inline_vecs, BIO_INLINE_VECS,
                 ┊REQ_SYNC | REQ_IDLE | REQ_META);
         if (rw)
                 bio.bi_opf |= REQ_OP_WRITE;
         else
                 bio.bi_opf |= REQ_OP_READ;

         __bio_add_page(&bio, page, PAGE_SIZE, 0);
         bio.bi_iter.bi_size = PAGE_SIZE;
         bio.bi_iter.bi_sector = rdev->sb_start +
                                 rdev->mddev->bitmap_info.offset +
                                 (PAGE_SECTORS << PAGE_SECTORS_SHIFT);

         ret = submit_bio_wait(&bio);
         bio_uninit(&bio);

         if (ret)
                 md_error(rdev->mddev, rdev);
         return ret;
}

static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx)
{
         struct page *page = llbitmap->pages[idx];
         struct mddev *mddev = llbitmap->mddev;
         struct md_rdev *rdev;
         int err = -EIO;

         if (page)
                 return page;

         page = alloc_page(GFP_KERNEL);
         if (!page)
                 return ERR_PTR(-ENOMEM);

         rdev_for_each(rdev, mddev) {
                 if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
                         continue;

                 err = llbitmap_rdev_page_io(rdev, page, idx, READ);
                 if (!err)
                         break;
         }

         if (err) {
                 __free_page(page);
                 return ERR_PTR(err);
         }

         return page;
}

static int llbitmap_write_page(struct llbitmap *llbitmap, int idx)
{
         struct page *page = llbitmap->pages[idx];
         struct mddev *mddev = llbitmap->mddev;
         struct md_rdev *rdev;
         int err = -EIO;

         if (!page)
                 return err;

         rdev_for_each(rdev, mddev) {
                 if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
                         continue;

                 if (!llbitmap_rdev_page_io(rdev, page, idx, WRITE))
                         err = 0;
         }

         return err;
}

static bool llbitmap_dirty(struct llbitmap *llbitmap)
{
         int i;

         for (i = 0; i < llbitmap->nr_pages; ++i) {
                 struct llbitmap_barrier *barrier = &llbitmap->barrier[i];

                 if (test_bit(BitmapPageDirty, &barrier->flags))
                         return true;
         }

         return false;
}

static void llbitmap_flush_dirty_page(struct llbitmap *llbitmap)
{
         int i;

         for (i = 0; i < llbitmap->nr_pages; ++i) {
                 struct llbitmap_barrier *barrier = &llbitmap->barrier[i];

                 if (!test_and_clear_bit(BitmapPageDirty, &barrier->flags))
                         continue;
                 llbitmap_write_page(llbitmap, i);
         }
}

> 
> Thanks,
> Kuai
> 
>>
>> .
>>
> 
> .
>