lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87seyquhpi.fsf@brahms.olymp>
Date: Fri, 10 May 2024 10:41:45 +0100
From: Luis Henriques <luis.henriques@...ux.dev>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: Theodore Ts'o <tytso@....edu>,  linux-ext4@...r.kernel.org,
  linux-fsdevel@...r.kernel.org,  linux-mm@...ck.org,
  linux-kernel@...r.kernel.org,  adilger.kernel@...ger.ca,  jack@...e.cz,
  ritesh.list@...il.com,  hch@...radead.org,  djwong@...nel.org,
  willy@...radead.org,  zokeefe@...gle.com,  yi.zhang@...wei.com,
  chengzhihao1@...wei.com,  yukuai3@...wei.com,  wangkefeng.wang@...wei.com
Subject: Re: [PATCH v3 03/26] ext4: correct the hole length returned by
 ext4_map_blocks()

On Fri 10 May 2024 11:39:48 AM +08, Zhang Yi wrote;

> On 2024/5/10 1:23, Luis Henriques wrote:
>> On Thu 09 May 2024 12:39:53 PM -04, Theodore Ts'o wrote;
>> 
>>> On Thu, May 09, 2024 at 04:16:34PM +0100, Luis Henriques wrote:
>>>>
>>>> It's looks like it's easy to trigger an infinite loop here using fstest
>>>> generic/039.  If I understand it correctly (which doesn't happen as often
>>>> as I'd like), this is due to an integer overflow in the 'if' condition,
>>>> and should be fixed with the patch below.
>>>
>>> Thanks for the report.  However, I can't reproduce the failure, and
>>> looking at generic/039, I don't see how it could be relevant to the
>>> code path in question.  Generic/039 creates a test symlink with two
>>> hard links in the same directory, syncs the file system, and then
>>> removes one of the hard links, and then drops access to the block
>>> device using dmflakey.  So I don't see how the extent code would be
>>> involved at all.  Are you sure that you have the correct test listed?
>> 
>> Yep, I just retested and it's definitely generic/039.  I'm using a simple
>> test environment, with virtme-ng.
>> 
>>> Looking at the code in question in fs/ext4/extents.c:
>>>
>>> again:
>>> 	ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start,
>>> 				  hole_start + len - 1, &es);
>>> 	if (!es.es_len)
>>> 		goto insert_hole;
>>>
>>>   	 * There's a delalloc extent in the hole, handle it if the delalloc
>>>   	 * extent is in front of, behind and straddle the queried range.
>>>   	 */
>>>  -	if (lblk >= es.es_lblk + es.es_len) {
>>>  +	if (lblk >= ((__u64) es.es_lblk) + es.es_len) {
>>>   		/*
>>>   		 * The delalloc extent is in front of the queried range,
>>>   		 * find again from the queried start block.
>>> 		len -= lblk - hole_start;
>>> 		hole_start = lblk;
>>> 		goto again;
>>>
>>> lblk and es.es_lblk are both __u32.  So the infinite loop is
>>> presumably because es.es_lblk + es.es_len has overflowed.  This should
>>> never happen(tm), and in fact we have a test for this case which
>> 
>> If I instrument the code, I can see that es.es_len is definitely set to
>> EXT_MAX_BLOCKS, which will overflow.
>> 
>
> Thanks for the report. After looking at the code, I think the root
> cause of this issue is the variable es was not initialized on replaying
> fast commit. ext4_es_find_extent_range() will return directly when
> EXT4_FC_REPLAY flag is set, and then the es.len becomes stall.
>
> I can always reproduce this issue on generic/039 with
> MKFS_OPTIONS="-O fast_commit".
>
> This uninitialization problem originally existed in the old
> ext4_ext_put_gap_in_cache(), but it didn't trigger any real problem
> since we never check and use extent cache when replaying fast commit.
> So I suppose the correct fix would be to unconditionally initialize
> the es variable.

Oh, you're absolutely right -- the extent_status 'es' struct isn't being
initialized in that case.  I totally failed to see that.  And yes, I also
failed to mention I had 'fast_commit' feature enabled, sorry!

Thanks a lot for figuring this out, Yi.  I'm looking at this code and
trying to understand if it would be safe to call __es_find_extent_range()
when EXT4_FC_REPLAY is in progress.  Probably not, and probably better to
simply do:

	es->es_lblk = es->es_len = es->es_pblk = 0;

in that case.  I'll send out a patch later today.

Cheers,
-- 
Luis

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ