linux-ext4 - Re: [BUG REPORT] ext4: “errors=remount-ro” has become “errors=shutdown”?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250107070820.GJ6174@frogsfrogsfrogs>
Date: Mon, 6 Jan 2025 23:08:20 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Baokun Li <libaokun1@...wei.com>
Cc: Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Christian Brauner <brauner@...nel.org>, sunyongjian1@...wei.com,
	Yang Erkun <yangerkun@...wei.com>, linux-fsdevel@...r.kernel.org,
	linux-xfs@...r.kernel.org
Subject: Re: [BUG REPORT] ext4: “errors=remount-ro” has become “errors=shutdown”?

On Tue, Jan 07, 2025 at 10:01:32AM +0800, Baokun Li wrote:
> On 2025/1/7 7:49, Darrick J. Wong wrote:
> > On Sat, Jan 04, 2025 at 10:41:28AM +0800, Baokun Li wrote:
> > > Hi Ted,
> > > 
> > > On 2025/1/3 23:54, Theodore Ts'o wrote:
> > > > On Fri, Jan 03, 2025 at 10:35:17AM -0500, Theodore Ts'o wrote:
> > > > > I don't see how setting the shutdown flag causes reads to fail.  That
> > > > > was true in an early version of the ext4 patch which implemented
> > > > > shutdown support, but one of the XFS developers (I don't remember if
> > > > > it was Dave or Cristoph) objected because XFS did not cause the
> > > > > read_pages function to fail.  Are you seeing this with an upstream
> > > > > kernel, or with a patched kernel?  The upstream kernel does *not* have
> > > > > the check in ext4_readpages() or ext4_read_folio() (post folio
> > > > > conversion).
> > > > OK, that's weird.  Testing on 6.13-rc4, I don't see the problem simulating an ext4 error:
> > > > 
> > > > root@...-xfstests:~# mke2fs -t ext4 -Fq /dev/vdc
> > > > /dev/vdc contains a ext4 file system
> > > > 	last mounted on /vdc on Fri Jan  3 10:38:21 2025
> > > > root@...-xfstests:~# mount -t ext4 -o errors=continue /dev/vdc /vdc
> > > We are discussing "errors=remount-ro," as the title states, not the
> > > continue mode. The key code leading to the behavior change is as follows,
> > > therefore the continue mode is not affected.
> > Hmm.  On the one hand, XFS has generally returned EIO (or ESHUTDOWN in a
> > couple of specialty cases) when the fs has been shut down.
> Indeed, this is the intended behavior during shutdown.
> > 
> > OTOH XFS also doesn't have errors=remount-ro; it just dies, which I
> > think has been its behavior for a long time.
> Yes. As an aside, is there any way for xfs to determine if -EIO is
> originating from a hardware error or if the filesystem has been shutdown?

XFS knows the difference, but nothing above it does.

> Or would you consider it useful to have the mount command display
> "shutdown" when the file system is being shut down?

Trouble is, will mount get confused and try to pass ",shutdown" as part
of a remount operation?  I suppose the fs is dead so what does it
matter...

> > To me, it doesn't sound unreasonable for ext* to allow reads after a
> > shutdown when errors=remount-ro since it's always had that behavior.
> Yes, a previous bug fix inadvertently changed the behavior of
> errors=remount-ro,
> and the patch to correct this is coming.
> 
> Additionally, ext4 now allows directory reads even after shutdown, is this
> expected behavior?

There's no formal specification for what shutdown means, so ... it's not
unexpected.  XFS doesn't allow that.

> > Bonus Q: do you want an errors=fail variant to shut things down fast?
> > 
> > --D
> 
> In my opinion, I have not yet seen a scenario where the file system needs
> to be shut down after an error occurs. Therefore, using errors=remount-ro
> to prevent modifications after an error is sufficient. Of course, if
> customers have such needs, implementing this mode is also very simple.

IO errors, sure.  Metadata errors?  No, we want to stop the world
immediately, either so the sysadmin can go run xfs_repair, or the clod
manager can just kill the node and deploy another.

--D

> 
> 
> Thanks,
> Baokun
> 
>