linux-ext4 - [Bug 195561] New: Suspicious persistent EXT4-fs error (device sda1): ext4_validate_block_bitmap:395: [Proc] bg 17: block 557056: invalid block bitmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-195561-13602@https.bugzilla.kernel.org/>
Date:   Mon, 24 Apr 2017 02:40:05 +0000
From:   bugzilla-daemon@...zilla.kernel.org
To:     linux-ext4@...nel.org
Subject: [Bug 195561] New: Suspicious persistent EXT4-fs error (device sda1):
 ext4_validate_block_bitmap:395: [Proc] bg 17: block 557056: invalid block
 bitmap

https://bugzilla.kernel.org/show_bug.cgi?id=195561

            Bug ID: 195561
           Summary: Suspicious persistent EXT4-fs error (device sda1):
                    ext4_validate_block_bitmap:395: [Proc] bg 17: block
                    557056: invalid block bitmap
           Product: File System
           Version: 2.5
    Kernel Version: 4.4 to 4.11
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@...nel-bugs.osdl.org
          Reporter: issor.oruam@...il.com
        Regression: No

Created attachment 255963
  --> https://bugzilla.kernel.org/attachment.cgi?id=255963&action=edit
dmesg on Phy SATA HDD1

While testing Android 7.1 nougat-x86 x86_64 several android-x86 community
members noticed the occurence of EXT4 partition remount RO
which causes a bootloop with continuous kernel panic on Android 7.x
which requires to reinstall Android OS image on EXT4 partitions.

When looking in logcat we would just see that everything stops working because
of the partion has been remounted in Read-Only.

Looking at dmesg output we see the following attached three logs for three test
cases:

Physical Sata HDD 1
Physical Sata HDD 2 
Virtualbox    vdi 3

January, 14th (ASUS motherboard with physical SATA HDD n.1)
[  842.760419] EXT4-fs error (device sda1): ext4_validate_block_bitmap:395:
comm Binder:1454_E: bg 17: block 557056: invalid block bitmap
[  842.873601] Aborting journal on device sda1-8.
[  842.908371] EXT4-fs (sda1): Remounting filesystem read-only
[  842.923638] EXT4-fs error (device sda1) in ext4_do_update_inode:4679:
Journal has aborted

March, 25th (ASUS motherboard with physical SATA HDD n.2, different from n.1)
[ 1510.269945] EXT4-fs error (device sda1): ext4_validate_block_bitmap:395:
comm main: bg 17: block 557056: invalid block bitmap
[ 1510.285464] Aborting journal on device sda1-8.
[ 1510.301047] EXT4-fs (sda1): Remounting filesystem read-only
[ 1510.323400] EXT4-fs error (device sda1) in ext4_do_update_inode:4679:
Journal has aborted

April, 25th (VirtualBox VM with vdi vitual drive n.3, different from n.1 and
n.2)
[ 1510.269945] EXT4-fs error (device sda1): ext4_validate_block_bitmap:395:
comm main: bg 17: block 557056: invalid block bitmap
[ 1510.285464] Aborting journal on device sda1-8.
[ 1510.301047] EXT4-fs (sda1): Remounting filesystem read-only
[ 1510.323400] EXT4-fs error (device sda1) in ext4_do_update_inode:4679:
Journal has aborted

What they all have in common is the bg and block which happen to be exactly the
same, no matter how many attempts on different physical or virtual HDDs.

The problem is intermittent, but happens quite frequently during initial Google
Play updates, so it may become a show stopper for Android and a series of
different OSes.

One catalyzer to let the issue happen is multithreading/processes forking which
Androdi 7.x uses far more than 6.0. Android 6.0 has no issue with the same
kernels. In my understanding there may be a sort block/bg locking issue leading
to concurrent write and validation of bitmaps

Another possible concurring root cause may be 64 bit kernel build,
as on virtualbox the issue is systematic with 64 bit build and I've never saw
it with 32bit builds. This would be coherent with statements in [1]

Doing some research I found reference of this problem in different websites
[1], [2] and [3]

[1] https://community.nxp.com/thread/447695

[2] https://jira.hpdd.intel.com/browse/LU-1026
(at the end EXT4 patch is mentioned)

[3]
https://github.com/tweag/lustre/blob/master/ldiskfs/kernel_patches/patches/rhel7/ext4-corrupted-inode-block-bitmaps-handling-patches.patch

The attached HACK workaround can avoid the problem, tested on top of kernel
4.4.62
but it's not a solution as it uses ext4_warning() instead of ext4_error()
and tricks the callers by pretending there was no error,
we could even put a check on "bg == 16 && block == 557056"
but it would still be a hack to workaround a bug in EXT4 bitmap validation
code.

It is also confirmed that kernel 4.9, 4.10 and 4.11 are also affected.

Mauro

-- 
You are receiving this mail because:
You are watching the assignee of the bug.