[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79b86b7b-8a65-49b8-aa33-bb73de47ad37@suse.com>
Date: Wed, 4 Dec 2024 14:46:15 +0800
From: Heming Zhao <heming.zhao@...e.com>
To: Joseph Qi <joseph.qi@...ux.alibaba.com>, ocfs2-devel@...ts.linux.dev
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
gregkh@...uxfoundation.org, Heming Zhao <heing.zhao@...e.com>,
stable@...r.kernel.org
Subject: Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when
unmounting an ocfs2 volume"
On 12/4/24 11:47, Joseph Qi wrote:
>
>
> On 12/4/24 11:32 AM, Heming Zhao wrote:
>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>> unmounting an ocfs2 volume").
>>
>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>> since the initial OCFS2 code." is incorrect. The correct introduction
>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>
>
> Could you please elaborate more how it happens?
> And it seems no difference with the new version. So we may submit a
> standalone revert patch to those backported stable kernels (< 6.10).
commit log from patch [2/2] should be revised.
change: This bug has existed since the initial OCFS2 code.
to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
----
See below for the details of patch [1/2].
following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".
static int ocfs2_sync_local_to_main()
{
... ...
1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
2 != -1) {
3 if ((bit_off < left) && (bit_off == start)) {
4 count++;
5 start++;
6 continue;
7 }
8 if (count) {
9 blkno = la_start_blk +
10 ocfs2_clusters_to_blocks(osb->sb,
11 start - count);
12
13 trace_ocfs2_sync_local_to_main_free();
14
15 status = ocfs2_release_clusters(handle,
16 main_bm_inode,
17 main_bm_bh, blkno,
18 count);
19 if (status < 0) {
20 mlog_errno(status);
21 goto bail;
22 }
23 }
24 if (bit_off >= left)
25 break;
26 count = 1;
27 start = bit_off + 1;
28 }
29
30 /* clear the contiguous bits until the end boundary */
31 if (count) {
32 blkno = la_start_blk +
33 ocfs2_clusters_to_blocks(osb->sb,
34 start - count);
35
36 trace_ocfs2_sync_local_to_main_free();
37
38 status = ocfs2_release_clusters(handle,
39 main_bm_inode,
40 main_bm_bh, blkno,
41 count);
42 if (status < 0)
43 mlog_errno(status);
44 }
... ...
}
bug flow:
1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
5. code runs to line 31, this area code releases space to main_bm for the same gd again.
kernel will report the following likely error:
OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878
thanks,
Heming
Powered by blists - more mailing lists