linux-kernel - Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <79b86b7b-8a65-49b8-aa33-bb73de47ad37@suse.com>
Date: Wed, 4 Dec 2024 14:46:15 +0800
From: Heming Zhao <heming.zhao@...e.com>
To: Joseph Qi <joseph.qi@...ux.alibaba.com>, ocfs2-devel@...ts.linux.dev
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
 gregkh@...uxfoundation.org, Heming Zhao <heing.zhao@...e.com>,
 stable@...r.kernel.org
Subject: Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when
 unmounting an ocfs2 volume"

On 12/4/24 11:47, Joseph Qi wrote:
> 
> 
> On 12/4/24 11:32 AM, Heming Zhao wrote:
>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>> unmounting an ocfs2 volume").
>>
>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>> since the initial OCFS2 code." is incorrect. The correct introduction
>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>
> 
> Could you please elaborate more how it happens?
> And it seems no difference with the new version. So we may submit a
> standalone revert patch to those backported stable kernels (< 6.10).

commit log from patch [2/2] should be revised.
change: This bug has existed since the initial OCFS2 code.
to    : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")

----
See below for the details of patch [1/2].

following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".

    static int ocfs2_sync_local_to_main()
    {
    	... ...
  1  	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
  2  	       != -1) {
  3  		if ((bit_off < left) && (bit_off == start)) {
  4  			count++;
  5  			start++;
  6  			continue;
  7  		}
  8  		if (count) {
  9  			blkno = la_start_blk +
10   				ocfs2_clusters_to_blocks(osb->sb,
11   							 start - count);
12
13   			trace_ocfs2_sync_local_to_main_free();
14
15   			status = ocfs2_release_clusters(handle,
16   							main_bm_inode,
17   							main_bm_bh, blkno,
18   							count);
19   			if (status < 0) {
20   				mlog_errno(status);
21   				goto bail;
22   			}
23   		}
24   		if (bit_off >= left)
25   			break;
26   		count = 1;
27   		start = bit_off + 1;
28   	}
29
30 	/* clear the contiguous bits until the end boundary */
31 	if (count) {
32 		blkno = la_start_blk +
33 			ocfs2_clusters_to_blocks(osb->sb,
34 					start - count);
35
36 		trace_ocfs2_sync_local_to_main_free();
37
38 		status = ocfs2_release_clusters(handle,
39 				main_bm_inode,
40 				main_bm_bh, blkno,
41 				count);
42 		if (status < 0)
43 			mlog_errno(status);
44  	}
    	... ...
    }

bug flow:
1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
5. code runs to line 31, this area code releases space to main_bm for the same gd again.

kernel will report the following likely error:
OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878

thanks,
Heming