[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190701135607.GB6549@mit.edu>
Date: Mon, 1 Jul 2019 09:56:07 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Geert Uytterhoeven <geert@...ux-m68k.org>
Cc: Arthur Marsh <arthur.marsh@...ernode.on.net>,
Richard Weinberger <richard.weinberger@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: ext3/ext4 filesystem corruption under post 5.1.0 kernels
On Mon, Jul 01, 2019 at 02:43:14PM +0200, Geert Uytterhoeven wrote:
> Hi Ted,
>
> Despite this fix having been applied upstream, the kernel prints from
> time to time:
>
> EXT4-fs (sda1): error count since last fsck: 5
> EXT4-fs (sda1): initial error at time 1557931133:
> ext4_get_branch:171: inode 1980: block 27550
> EXT4-fs (sda1): last error at time 1558114349:
> ext4_get_branch:171: inode 1980: block 27550
>
> This happens even after a manual run of "e2fsck -f" (while it's mounted
> RO), which reports a clean file system.
What's happening is this. When the kernel detects a corruption, newer
kernels will set these superblock fields:
__le32 s_error_count; /* number of fs errors */
__le32 s_first_error_time; /* first time an error happened */
__le32 s_first_error_ino; /* inode involved in first error */
__le64 s_first_error_block; /* block involved of first error */
__u8 s_first_error_func[32] __nonstring; /* function where the error happened */
__le32 s_first_error_line; /* line number where error happened */
__le32 s_last_error_time; /* most recent time of an error */
__le32 s_last_error_ino; /* inode involved in last error */
__le32 s_last_error_line; /* line number where error happened */
__le64 s_last_error_block; /* block involved of last error */
__u8 s_last_error_func[32] __nonstring; /* function where the error happened */
When newer versions of e2fsck *fix* the corruption, it will clear
these fields. It's basically a safety check because *way* too many
ext4 users run with errors=continue (aka, "don't worry, be happy"
mode), and so this is a poke in the system logs that the file system
is corrupted, and they, really, *REALLY* should fix it before they
lose (more) data.
> The inode and block numbers match the numbers printed due to the
> previous bug.
You can also see when the last file system error was detected via:
% date -d @1558114349
Fri 17 May 2019 01:32:29 PM EDT
> Do you have an idea what's wrong?
> Note that I run a very old version of e2fsck (from a decade ago).
... and that's the problem. If you're going to be using newer
versions of the kernel, you really should be using newer versions of
e2fsprogs.
There have been a lot of bug fixes in the last 10 years, and some of
them can be data corruption bugs....
- Ted
Powered by blists - more mailing lists