linux-ext4 - Re: [PATCH] ext4: fix reserved space counter leakage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e33bc4a3-4378-364e-c834-8bb479872fa4@linux.alibaba.com>
Date:   Sun, 22 Aug 2021 21:14:18 +0800
From:   Joseph Qi <joseph.qi@...ux.alibaba.com>
To:     Eric Whitney <enwlinux@...il.com>,
        Jeffle Xu <jefflexu@...ux.alibaba.com>
Cc:     tytso@....edu, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: fix reserved space counter leakage



On 8/22/21 9:06 PM, Joseph Qi wrote:
> 
> 
> On 8/21/21 12:45 AM, Eric Whitney wrote:
>> * Jeffle Xu <jefflexu@...ux.alibaba.com>:
>>> When ext4_es_insert_delayed_block() returns error, e.g., ENOMEM,
>>> previously reserved space is not released as the error handling,
>>> in which case @s_dirtyclusters_counter is left over. Since this delayed
>>> extent failes to be inserted into extent status tree, when inode is
>>> written back, the extra @s_dirtyclusters_counter won't be subtracted and
>>> remains there forever.
>>>
>>> This can leads to /sys/fs/ext4/<dev>/delayed_allocation_blocks remains
>>> non-zero even when syncfs is executed on the filesystem.
>>>
>>
>> Hi:
>>
>> I think the fix below looks fine.  However, this comment doesn't look right
>> to me.  Are you really seeing delayed_allocation_blocks values that remain
>> incorrectly elevated across last closes (or across file system unmounts and
>> remounts)?  s_dirtyclusters_counter isn't written out to stable storage -
>> it's an in-memory only variable that's created when a file is first opened
>> and destroyed on last close.
>>
> 
> Actually we've encountered a real case in our production environment,
> which has about 20G space lost (df - du = ~20G).
> After some investigation, we've confirmed that it cause by leaked
> s_dirtyclusters_counter (~5M), and even we do manually sync, it remains.
> Since there is no error messages, we've checked all logic around
> s_dirtyclusters_counter and found this. Also we can manually inject
> error and reproduce the leaked s_dirtyclusters_counter.
> 

BTW, it's a runtime lost, but not about on-disk.
If umount and then mount it again, it becomes normal. But
application also should be restarted...

Thanks,
Joseph