[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1bc1c43b-fc03-41ab-9540-af77c8f4b38d@linux.alibaba.com>
Date: Wed, 4 Dec 2024 20:09:55 +0800
From: Joseph Qi <joseph.qi@...ux.alibaba.com>
To: Heming Zhao <heming.zhao@...e.com>, ocfs2-devel@...ts.linux.dev
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
gregkh@...uxfoundation.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when
unmounting an ocfs2 volume"
On 12/4/24 7:34 PM, Heming Zhao wrote:
> On 12/4/24 17:28, Joseph Qi wrote:
>>
>>
>> On 12/4/24 2:46 PM, Heming Zhao wrote:
>>> On 12/4/24 11:47, Joseph Qi wrote:
>>>>
>>>>
>>>> On 12/4/24 11:32 AM, Heming Zhao wrote:
>>>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>>>>> unmounting an ocfs2 volume").
>>>>>
>>>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>>>>> since the initial OCFS2 code." is incorrect. The correct introduction
>>>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>>>>
>>>>
>>>> Could you please elaborate more how it happens?
>>>> And it seems no difference with the new version. So we may submit a
>>>> standalone revert patch to those backported stable kernels (< 6.10).
>>>
>>> commit log from patch [2/2] should be revised.
>>> change: This bug has existed since the initial OCFS2 code.
>>> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
>>>
>>> ----
>>> See below for the details of patch [1/2].
>>>
>>> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".
>>>
>>> static int ocfs2_sync_local_to_main()
>>> {
>>> ... ...
>>> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
>>> 2 != -1) {
>>> 3 if ((bit_off < left) && (bit_off == start)) {
>>> 4 count++;
>>> 5 start++;
>>> 6 continue;
>>> 7 }
>>> 8 if (count) {
>>> 9 blkno = la_start_blk +
>>> 10 ocfs2_clusters_to_blocks(osb->sb,
>>> 11 start - count);
>>> 12
>>> 13 trace_ocfs2_sync_local_to_main_free();
>>> 14
>>> 15 status = ocfs2_release_clusters(handle,
>>> 16 main_bm_inode,
>>> 17 main_bm_bh, blkno,
>>> 18 count);
>>> 19 if (status < 0) {
>>> 20 mlog_errno(status);
>>> 21 goto bail;
>>> 22 }
>>> 23 }
>>> 24 if (bit_off >= left)
>>> 25 break;
>>> 26 count = 1;
>>> 27 start = bit_off + 1;
>>> 28 }
>>> 29
>>> 30 /* clear the contiguous bits until the end boundary */
>>> 31 if (count) {
>>> 32 blkno = la_start_blk +
>>> 33 ocfs2_clusters_to_blocks(osb->sb,
>>> 34 start - count);
>>> 35
>>> 36 trace_ocfs2_sync_local_to_main_free();
>>> 37
>>> 38 status = ocfs2_release_clusters(handle,
>>> 39 main_bm_inode,
>>> 40 main_bm_bh, blkno,
>>> 41 count);
>>> 42 if (status < 0)
>>> 43 mlog_errno(status);
>>> 44 }
>>> ... ...
>>> }
>>>
>>> bug flow:
>>> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
>>> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
>>> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
>>> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
>>> 5. code runs to line 31, this area code releases space to main_bm for the same gd again.
>>>
>>> kernel will report the following likely error:
>>> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878
>>>
>>
>> Okay, IIUC, it seems we have to:
>> 1. revert commit dfe6c5692fb5 (so does stable kernel).
>> 2. fix 30dd3478c3cd in following way:
>>
>> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
>> index 5df34561c551..f0feadac2ef1 100644
>> --- a/fs/ocfs2/localalloc.c
>> +++ b/fs/ocfs2/localalloc.c
>> @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>> start = count = 0;
>> left = le32_to_cpu(alloc->id1.bitmap1.i_total);
>> - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
>> + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <=
>> left) {
>
> The ocfs2_find_next_zero_bit() always returns a value within the range [0, left],
You're right.
> do you like the following code?
>
> - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
> - left) {
> + for(;;) {
> + bit_off = ocfs2_find_next_zero_bit(bitmap, left, start);
>
Or simplify to:
while (1) {
bit_off = ocfs2_find_next_zero_bit(bitmap, left, start);
...
}
Thanks,
Joseph
Powered by blists - more mailing lists