[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47fe189d-eadc-495d-984e-705fab58acad@suse.com>
Date: Wed, 4 Dec 2024 19:34:45 +0800
From: Heming Zhao <heming.zhao@...e.com>
To: Joseph Qi <joseph.qi@...ux.alibaba.com>, ocfs2-devel@...ts.linux.dev
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
gregkh@...uxfoundation.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when
unmounting an ocfs2 volume"
On 12/4/24 17:28, Joseph Qi wrote:
>
>
> On 12/4/24 2:46 PM, Heming Zhao wrote:
>> On 12/4/24 11:47, Joseph Qi wrote:
>>>
>>>
>>> On 12/4/24 11:32 AM, Heming Zhao wrote:
>>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>>>> unmounting an ocfs2 volume").
>>>>
>>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>>>> since the initial OCFS2 code." is incorrect. The correct introduction
>>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>>>
>>>
>>> Could you please elaborate more how it happens?
>>> And it seems no difference with the new version. So we may submit a
>>> standalone revert patch to those backported stable kernels (< 6.10).
>>
>> commit log from patch [2/2] should be revised.
>> change: This bug has existed since the initial OCFS2 code.
>> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
>>
>> ----
>> See below for the details of patch [1/2].
>>
>> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".
>>
>> static int ocfs2_sync_local_to_main()
>> {
>> ... ...
>> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
>> 2 != -1) {
>> 3 if ((bit_off < left) && (bit_off == start)) {
>> 4 count++;
>> 5 start++;
>> 6 continue;
>> 7 }
>> 8 if (count) {
>> 9 blkno = la_start_blk +
>> 10 ocfs2_clusters_to_blocks(osb->sb,
>> 11 start - count);
>> 12
>> 13 trace_ocfs2_sync_local_to_main_free();
>> 14
>> 15 status = ocfs2_release_clusters(handle,
>> 16 main_bm_inode,
>> 17 main_bm_bh, blkno,
>> 18 count);
>> 19 if (status < 0) {
>> 20 mlog_errno(status);
>> 21 goto bail;
>> 22 }
>> 23 }
>> 24 if (bit_off >= left)
>> 25 break;
>> 26 count = 1;
>> 27 start = bit_off + 1;
>> 28 }
>> 29
>> 30 /* clear the contiguous bits until the end boundary */
>> 31 if (count) {
>> 32 blkno = la_start_blk +
>> 33 ocfs2_clusters_to_blocks(osb->sb,
>> 34 start - count);
>> 35
>> 36 trace_ocfs2_sync_local_to_main_free();
>> 37
>> 38 status = ocfs2_release_clusters(handle,
>> 39 main_bm_inode,
>> 40 main_bm_bh, blkno,
>> 41 count);
>> 42 if (status < 0)
>> 43 mlog_errno(status);
>> 44 }
>> ... ...
>> }
>>
>> bug flow:
>> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
>> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
>> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
>> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
>> 5. code runs to line 31, this area code releases space to main_bm for the same gd again.
>>
>> kernel will report the following likely error:
>> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878
>>
>
> Okay, IIUC, it seems we have to:
> 1. revert commit dfe6c5692fb5 (so does stable kernel).
> 2. fix 30dd3478c3cd in following way:
>
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> index 5df34561c551..f0feadac2ef1 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
> start = count = 0;
> left = le32_to_cpu(alloc->id1.bitmap1.i_total);
>
> - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
> + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <=
> left) {
The ocfs2_find_next_zero_bit() always returns a value within the range [0, left],
do you like the following code?
- while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
- left) {
+ for(;;) {
+ bit_off = ocfs2_find_next_zero_bit(bitmap, left, start);
-Heming
> - if (bit_off == start) {
> + if ((bit_off < left) && (bit_off == start)) {
> count++;
> start++;
> continue;
> @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
> goto bail;
> }
> }
> -
> + if (bit_off >= left)
> + break;
> count = 1;
> start = bit_off + 1;
> }
>
> Thanks,
> Joseph
>
>
Powered by blists - more mailing lists