lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180409234441.GD2608@thunk.org>
Date:   Mon, 9 Apr 2018 19:44:41 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Liu Bo <obuil.liubo@...il.com>
Cc:     Liu Bo <bo.liu@...ux.alibaba.com>, Jan Kara <jack@...e.cz>,
        linux-ext4@...r.kernel.org
Subject: Re: kernel BUG at fs/ext4/mballoc.c:1911!

On Mon, Apr 09, 2018 at 10:25:46AM -0700, Liu Bo wrote:
> > (e) there're errors about reading this bitmap(group 8383) shown in the log,
> > crash> grep group e4b.txt
> >   bd_group = 8383
> >
> >  however when it comes to BUG_ON(k >= max), reading this bitmap has
> >  been successful, and it is the inconsistence between ->bb_counters
> >  and the buddy bitmap that ends up with the crash, but if the buddy
> >  bitmap was regenerated, bb_counters should match with the buddy
> >  bitmap.

What probably happened is that the page containing actual allocation
bitmap was pushed out of memory due to memory pressure.  However, the
buddy bitmap was still cached in memory.  That's actually quite
possible since the buddy bitmap will often be referenced more
frequently than the allocation bitmap (for example, while searching
for free space of a specific size, and then having that block group
skipped when it's not available).

Since there was an I/O error reading the allocation bitmap, the buffer
is not valid.  So it's not surprising that the BUG_ON(k >= max) is
getting triggered.

It's of course not desirable.  What should happen is that once we
realize that the allocation bitmap can't be read, we should mark the
block group as not being eligible for allocations via the
EXT4_GROUP_INFO_BBITMAP_CORRUT_BIT, to avoid the BUG_ON from
triggering.

I'll put it on my TODO list.  Or feel free to try your hand at making
the change yourself, versus the latest upstream kernel, and send a
proposed patch to the linux-ext4@...r.kernel.org mailing list.

Cheers,

						- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ