linux-kernel - ext3 file system livelock and file system corruption, 4.9.166 stable kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACMCwJ+qLHV+cjQC2kAnLQP6qVz1bJ75V8BqQBV5HE1edRC-AQ@mail.gmail.com>
Date:   Tue, 2 Apr 2019 13:08:45 +0300
From:   Jari Ruusu <jari.ruusu@...il.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     "zhangyi (F)" <yi.zhang@...wei.com>,
        "Theodore Ts'o" <tytso@....edu>, Jan Kara <jack@...e.cz>,
        linux-kernel@...r.kernel.org
Subject: ext3 file system livelock and file system corruption, 4.9.166 stable kernel

To trigger this ext4 file system bug, you need a sparse file with
correct sparse pattern on old-school ext3 file system. I tried
more simpler ways to trigger this but those attempts did not
trigger the bug. I have provided compressed sparse file that
reliably triggers the bug. Size of compressed sparse file 1667256
bytes. Size of uncompressed sparse file 7369850880 bytes.
Following commands will demo the problem.

  wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz
  xz -d sparse-demo.data.xz
  mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1
  mount -t ext3 /dev/sdc1 /mnt
  cp -v --sparse=always sparse-demo.data /mnt/aa
  cp -v --sparse=always sparse-demo.data /mnt/bb
  umount /mnt
  mount -t ext3 /dev/sdc1 /mnt
  cp -v --sparse=always /mnt/bb /mnt/aa

That last cp command reliably triggers the bug that livelocks and
after reset you have file system corruption to deal with. Deeply
unfunny.

The bug is caused by
"ext4: brelse all indirect buffer in ext4_ind_remove_space()"
upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from
<yi.zhang@...wei.com>, who provided a follow-up patch
"ext4: cleanup bh release code in ext4_ind_remove_space()"
upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The
problem with that follow-up patch is that it is almost criminally
mislabeled. It should have said "fixes ext3 livelock and file
system corrupting bug" or something like that, so that Greg KH &
Co would have understood that it must be backported to stable
kernels too. Now the bug appears to be in all/most stable kernels
already.

Below is the buggy patch that causes the problem. Look at those
new while loops. Once the while condition is true once, it is
ALWAYS true, so it livelocks.

> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -1385,10 +1385,14 @@ end_range:
>  					   partial->p + 1,
>  					   partial2->p,
>  					   (chain+n-1) - partial);
> -			BUFFER_TRACE(partial->bh, "call brelse");
> -			brelse(partial->bh);
> -			BUFFER_TRACE(partial2->bh, "call brelse");
> -			brelse(partial2->bh);
> +			while (partial > chain) {
> +				BUFFER_TRACE(partial->bh, "call brelse");
> +				brelse(partial->bh);
> +			}
> +			while (partial2 > chain2) {
> +				BUFFER_TRACE(partial2->bh, "call brelse");
> +				brelse(partial2->bh);
> +			}
>  			return 0;
>  		}
>

Greg & Co,
Please revert that above patch from stable kernels or backport the
follow-up patch that fixes the problem.

-- 
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189