linux-ext4 - Data exposure on IO error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200731225621.GA7126@quack2.suse.cz>
Date:   Sat, 1 Aug 2020 00:56:21 +0200
From:   Jan Kara <jack@...e.cz>
To:     linux-ext4@...r.kernel.org
Cc:     rebello.anthony@...il.com
Subject: Data exposure on IO error

Hello!

In bug 207729, Anthony reported a bug that can actually lead to a stale
data exposure on IO error. The problem is relatively simple: Suppose we
do:

  fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0644);
  write(fd, buf, 4096);
  fsync(fd);

And IO error happens when fsync writes the block of "file". The IO error
gets properly reported to userspace but otherwise the filesystem keeps
running. So the transaction creating "file" and allocating block to it can
commit. Then when page cache of "file" gets evicted, the user can read
stale block contents (provided the IO error was just temporary or involving
only writes).

Now I understand in face of IO errors the behavior is really undefined but
potential exposure of stale data seems worse than strictly necessary. Also
if we run in data=ordered mode, especially if also data_err=abort is set,
user would rightfully expect that the filesystem gets aborted when such IO
error happens but that's not the case. Generally data_err=abort seems a bit
misnamed (and the manpage is wrong about this mount option) since what it
really does is that if jbd2 thread encounters error when writing back
ordered data, the filesystem is aborted. However the ordered data can be
written back by other processes as well and in that case the error is just
lost / reported to userspace but the filesystem doesn't get aborted.

As I was thinking about it, it seems to me that in data=ordered mode, we
should just always abort the filesystem when writeback of newly allocated
block fails to avoid the stale data exposure mentioned above. And then, we
could just deprecate data_err= mount option because it wouldn't be any
useful anymore... What do people think?

								Honza

[1] https://bugzilla.kernel.org/show_bug.cgi?id=207729
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR