linux-kernel - Re: [PATCH v6 md-6.18 11/11] md/md-llbitmap: introduce new lockless bitmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <dcec1dd2-903a-3569-30e4-7af916ecba4b@huaweicloud.com>
Date: Fri, 29 Aug 2025 09:03:30 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Li Nan <linan666@...weicloud.com>, Yu Kuai <yukuai1@...weicloud.com>,
 hch@...radead.org, corbet@....net, agk@...hat.com, snitzer@...nel.org,
 mpatocka@...hat.com, song@...nel.org, xni@...hat.com, hare@...e.de,
 colyli@...nel.org
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 dm-devel@...ts.linux.dev, linux-raid@...r.kernel.org, yi.zhang@...wei.com,
 yangerkun@...wei.com, johnny.chenyi@...wei.com,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v6 md-6.18 11/11] md/md-llbitmap: introduce new lockless
 bitmap

Hi,

在 2025/08/28 19:24, Li Nan 写道:
> 
> 
> 在 2025/8/26 16:52, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> Redundant data is used to enhance data fault tolerance, and the storage
>> method for redundant data vary depending on the RAID levels. And it's
>> important to maintain the consistency of redundant data.
>>
>> Bitmap is used to record which data blocks have been synchronized and 
>> which
>> ones need to be resynchronized or recovered. Each bit in the bitmap
>> represents a segment of data in the array. When a bit is set, it 
>> indicates
>> that the multiple redundant copies of that data segment may not be
>> consistent. Data synchronization can be performed based on the bitmap 
>> after
>> power failure or readding a disk. If there is no bitmap, a full disk
>> synchronization is required.
>>
>> Key Features:
>>
>>   - IO fastpath is lockless, if user issues lots of write IO to the same
>>   bitmap bit in a short time, only the first write have additional 
>> overhead
>>   to update bitmap bit, no additional overhead for the following writes;
>>   - support only resync or recover written data, means in the case 
>> creating
>>   new array or replacing with a new disk, there is no need to do a 
>> full disk
>>   resync/recovery;
>>
>> Key Concept:
>>
>>   - State Machine:
>>
>> Each bit is one byte, contain 6 difference state, see llbitmap_state. And
>> there are total 8 differenct actions, see llbitmap_action, can change 
>> state:
>>
>> llbitmap state machine: transitions between states
>>
>> |           | Startwrite | Startsync | Endsync | Abortsync|
>> | --------- | ---------- | --------- | ------- | -------  |
>> | Unwritten | Dirty      | x         | x       | x        |
>> | Clean     | Dirty      | x         | x       | x        |
>> | Dirty     | x          | x         | x       | x        |
>> | NeedSync  | x          | Syncing   | x       | x        |
>> | Syncing   | x          | Syncing   | Dirty   | NeedSync |
>>
>> |           | Reload   | Daemon | Discard   | Stale     |
>> | --------- | -------- | ------ | --------- | --------- |
>> | Unwritten | x        | x      | x         | x         |
>> | Clean     | x        | x      | Unwritten | NeedSync  |
>> | Dirty     | NeedSync | Clean  | Unwritten | NeedSync  |
>> | NeedSync  | x        | x      | Unwritten | x         |
>> | Syncing   | NeedSync | x      | Unwritten | NeedSync  |
>>
>> Typical scenarios:
>>
>> 1) Create new array
>> All bits will be set to Unwritten by default, if --assume-clean is set,
>> all bits will be set to Clean instead.
>>
>> 2) write data, raid1/raid10 have full copy of data, while raid456 
>> doesn't and
>> rely on xor data
>>
>> 2.1) write new data to raid1/raid10:
>> Unwritten --StartWrite--> Dirty
>>
>> 2.2) write new data to raid456:
>> Unwritten --StartWrite--> NeedSync
>>
>> Because the initial recover for raid456 is skipped, the xor data is 
>> not build
>> yet, the bit must set to NeedSync first and after lazy initial recover is
>> finished, the bit will finially set to Dirty(see 5.1 and 5.4);
>>
>> 2.3) cover write
>> Clean --StartWrite--> Dirty
>>
>> 3) daemon, if the array is not degraded:
>> Dirty --Daemon--> Clean
>>
>> For degraded array, the Dirty bit will never be cleared, prevent full 
>> disk
>> recovery while readding a removed disk.
>>
>> 4) discard
>> {Clean, Dirty, NeedSync, Syncing} --Discard--> Unwritten
>>
>> 5) resync and recover
>>
>> 5.1) common process
>> NeedSync --Startsync--> Syncing --Endsync--> Dirty --Daemon--> Clean
> 
> There is some issue whith Dirty state:
> 1. The Dirty bit will not synced when a disk is re-add.
> 2. It remains Dirty even after a full recovery -- it should be Clean.

We're setting new bits to dirty for degraded array, and there is no
futher action to change the state to need sync before recovery by new
disk.

This can be fixed by setting new bits directly to need sync for degraded
array, will do this in the next version.

Thanks,
Kuai
>