[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ef2bd059-e32d-41a2-bb33-da0621d7ff02@suse.de>
Date: Mon, 21 Jul 2025 09:20:35 +0200
From: Hannes Reinecke <hare@...e.de>
To: Yu Kuai <yukuai1@...weicloud.com>, corbet@....net, agk@...hat.com,
snitzer@...nel.org, mpatocka@...hat.com, song@...nel.org
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
dm-devel@...ts.linux.dev, linux-raid@...r.kernel.org, yi.zhang@...wei.com,
yangerkun@...wei.com, johnny.chenyi@...wei.com,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v3 11/11] md/md-llbitmap: introduce new lockless bitmap
On 7/21/25 09:12, Yu Kuai wrote:
> Hi,
>
> 在 2025/07/21 14:20, Hannes Reinecke 写道:
>> On 7/18/25 11:23, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@...wei.com>
>>>
>
>>> +
>>> +#define BITMAP_DATA_OFFSET 1024
>>> +
>>> +/* 64k is the max IO size of sync IO for raid1/raid10 */
>>> +#define MIN_CHUNK_SIZE (64 * 2)
>>> +
>>
>> Hmm? Which one is it?
>> Comment says 'max IO size', but it's called 'MIN_CHUNK_SIZE'...
>
> max IO size here means the internal recovery IO size for raid1/10,
> and we're handling at most one llbimtap bit(chunksie) at a time, so
> chunksize should be at least 64k, otherwise recovery IO size will be
> less.
>
Okay.
>>> +/*
>>> + * Dirtied bits that have not been accessed for more than 5s will be
>>> cleared
>>> + * by daemon.
>>> + */
>>> +#define BARRIER_IDLE 5
>>> +
>>
>> Should this be changeable, too?
>
> Yes, idealy this should. Perhaps a new sysfs api?
>
Yes, similarly to the daemon_delay one.
>>
>
>>> + if (!test_bit(LLPageDirty, &pctl->flags))
>>> + set_bit(LLPageDirty, &pctl->flags);
>>> +
>>> + /*
>>> + * The subpage usually contains a total of 512 bits. If any
>>> single bit
>>> + * within the subpage is marked as dirty, the entire sector will be
>>> + * written. To avoid impacting write performance, when multiple
>>> bits
>>> + * within the same sector are modified within a short time
>>> frame, all
>>> + * bits in the sector will be collectively marked as dirty at once.
>>> + */
>>
>> How short is the 'short timeframe'?
>> Is this the BARRIER_IDLE setting?
>> Please clarify.
>
> Yes, if the page is not accessed for BARRIER_IDLE seconds.
>
Please update the comment to refer to that.
>>> +static struct page *llbitmap_read_page(struct llbitmap *llbitmap,
>>> int idx)
>>> +{
>>> + struct mddev *mddev = llbitmap->mddev;
>>> + struct page *page = NULL;
>>> + struct md_rdev *rdev;
>>> +
>>> + if (llbitmap->pctl && llbitmap->pctl[idx])
>>> + page = llbitmap->pctl[idx]->page;
>>> + if (page)
>>> + return page;
>>> +
>>> + page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> + if (!page)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + rdev_for_each(rdev, mddev) {
>>> + sector_t sector;
>>> +
>>> + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
>>> + continue;
>>> +
>>> + sector = mddev->bitmap_info.offset +
>>> + (idx << PAGE_SECTORS_SHIFT);
>>> +
>>> + if (sync_page_io(rdev, sector, PAGE_SIZE, page, REQ_OP_READ,
>>> + true))
>>> + return page;
>>> +
>>> + md_error(mddev, rdev);
>>> + }
>>> +
>>> + __free_page(page);
>>> + return ERR_PTR(-EIO);
>>> +}
>>> +
>>
>> Have you considered moving to folios here?
>>
>
> Of course, however, because the md high level helpers is still using
> page, I'm thinking about using page for now, which is simpler, and
> moving to folios for all md code later.
>
>
Fair enough.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@...e.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Powered by blists - more mailing lists