linux-ext4 - Re: [PATCH] ext4: critical info format fix in __ext4_grp_locked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4D883518.8070505@coly.li>
Date:	Tue, 22 Mar 2011 13:35:20 +0800
From:	Coly Li <i@...y.li>
To:	Ted Ts'o <tytso@....edu>
CC:	Tao Ma <tm@....ma>, Robin Dong <hao.bigrat@...il.com>,
	linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: critical info format fix in __ext4_grp_locked_error

On 2011年03月22日 10:30, Tao Ma Wrote:
> Hi Ted,
> On 03/22/2011 08:47 AM, Ted Ts'o wrote:
>> Applied to the ext4 patch queue.
>>
>> On Fri, Mar 18, 2011 at 05:58:03PM +0800, Robin Dong wrote:
>>> From: Robin Dong<sanbai@...bao.com>
>>>
>>> When we do performence-testing on ext4 filesystem, we observe a warning like this:
>>>
>>> "[ 1684.113205] EXT4-fs error (device sda7): ext4_mb_generate_buddy:718: group 259825901 blocks in bitmap, 26057 in gd"
>>>
>>> indeed, it should be
>>>
>>> "group 2598, 25901 blocks in bitmap, 26057 in gd"
>>
[snip]
>>> This bug is found on upstream 2.6.36 kernel. We ran a 2.6.36 kernel
>>> on the online system with 8 Ext4 file systems. 2 of them are mounted
>>> with delayed allocation feature. This warning is only observed on
>>> delayed allocation enabled Ext4 file systems.
>>>
>>> This issue is not easy to reproduce, on two servers with 2.6.36
>>> kenrel + ext4, after running 110+ days, the error starts to appear
>>> on kernel log. When check the error log, we found the info format
>>> should be fixed, that's how this patch comes.
>>
>> Can you send more information about what sort of workloads your
>> servers are under, and any other information about how to reproduce
>> it?
> OK, so let me try to describe the situation here.
> This is a web cache server and we use squid to cache some data. This bug
> was found we were testing 2.6.36 vanilla kernel. We don't know for sure
> how to reproduce it since it showed up when the test server ran for
> about 100 days. And the bad thing is that the volume was reformatted for
> another test. :( But we have several machines here, and we are
> continuing our test, so if there are any error happening again, we
> promise that we will prompt what we find immediately.
>
> btw, when testing 2.6.32 kernel, we find another error, a dir inode is
> corrupted and some error in message like
>
> Mar 16 11:15:28 cache161 kernel: [484403.699588] EXT4-fs error (device
> sda5): ext4_lookup: deleted inode referenced: 21496065
>
> This volume is also mounted with delay allocation.
>

When we observed these 2 issues, the Ext4 file systems were mounted with delalloc option.

-- 
Coly Li
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html