linux-kernel - Re: [PATCH v3 11/11] md/md-llbitmap: introduce new lockless bitmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ef2bd059-e32d-41a2-bb33-da0621d7ff02@suse.de>
Date: Mon, 21 Jul 2025 09:20:35 +0200
From: Hannes Reinecke <hare@...e.de>
To: Yu Kuai <yukuai1@...weicloud.com>, corbet@....net, agk@...hat.com,
 snitzer@...nel.org, mpatocka@...hat.com, song@...nel.org
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 dm-devel@...ts.linux.dev, linux-raid@...r.kernel.org, yi.zhang@...wei.com,
 yangerkun@...wei.com, johnny.chenyi@...wei.com,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v3 11/11] md/md-llbitmap: introduce new lockless bitmap

On 7/21/25 09:12, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/21 14:20, Hannes Reinecke 写道:
>> On 7/18/25 11:23, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@...wei.com>
>>>
> 
>>> +
>>> +#define BITMAP_DATA_OFFSET 1024
>>> +
>>> +/* 64k is the max IO size of sync IO for raid1/raid10 */
>>> +#define MIN_CHUNK_SIZE (64 * 2)
>>> +
>>
>> Hmm? Which one is it?
>> Comment says 'max IO size', but it's called 'MIN_CHUNK_SIZE'...
> 
> max IO size here means the internal recovery IO size for raid1/10,
> and we're handling at most one llbimtap bit(chunksie) at a time, so
> chunksize should be at least 64k, otherwise recovery IO size will be
> less.
> 
Okay.

>>> +/*
>>> + * Dirtied bits that have not been accessed for more than 5s will be 
>>> cleared
>>> + * by daemon.
>>> + */
>>> +#define BARRIER_IDLE 5
>>> +
>>
>> Should this be changeable, too?
> 
> Yes, idealy this should. Perhaps a new sysfs api?
> 
Yes, similarly to the daemon_delay one.

>>
> 
>>> +    if (!test_bit(LLPageDirty, &pctl->flags))
>>> +        set_bit(LLPageDirty, &pctl->flags);
>>> +
>>> +    /*
>>> +     * The subpage usually contains a total of 512 bits. If any 
>>> single bit
>>> +     * within the subpage is marked as dirty, the entire sector will be
>>> +     * written. To avoid impacting write performance, when multiple 
>>> bits
>>> +     * within the same sector are modified within a short time 
>>> frame, all
>>> +     * bits in the sector will be collectively marked as dirty at once.
>>> +     */
>>
>> How short is the 'short timeframe'?
>> Is this the BARRIER_IDLE setting?
>> Please clarify.
> 
> Yes, if the page is not accessed for BARRIER_IDLE seconds.
> 
Please update the comment to refer to that.

>>> +static struct page *llbitmap_read_page(struct llbitmap *llbitmap, 
>>> int idx)
>>> +{
>>> +    struct mddev *mddev = llbitmap->mddev;
>>> +    struct page *page = NULL;
>>> +    struct md_rdev *rdev;
>>> +
>>> +    if (llbitmap->pctl && llbitmap->pctl[idx])
>>> +        page = llbitmap->pctl[idx]->page;
>>> +    if (page)
>>> +        return page;
>>> +
>>> +    page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +    if (!page)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    rdev_for_each(rdev, mddev) {
>>> +        sector_t sector;
>>> +
>>> +        if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
>>> +            continue;
>>> +
>>> +        sector = mddev->bitmap_info.offset +
>>> +             (idx << PAGE_SECTORS_SHIFT);
>>> +
>>> +        if (sync_page_io(rdev, sector, PAGE_SIZE, page, REQ_OP_READ,
>>> +                 true))
>>> +            return page;
>>> +
>>> +        md_error(mddev, rdev);
>>> +    }
>>> +
>>> +    __free_page(page);
>>> +    return ERR_PTR(-EIO);
>>> +}
>>> +
>>
>> Have you considered moving to folios here?
>>
> 
> Of course, however, because the md high level helpers is still using
> page, I'm thinking about using page for now, which is simpler, and
> moving to folios for all md code later.
> 
> 
Fair enough.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@...e.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich