[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200117162927.GB448999@mit.edu>
Date: Fri, 17 Jan 2020 11:29:27 -0500
From: "Theodore Y. Ts'o" <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org,
"Darrick J. Wong" <darrick.wong@...cle.com>
Subject: Re: ext2fs_link() corrupting a directory
On Fri, Jan 17, 2020 at 01:24:20PM +0100, Jan Kara wrote:
>
> I was tracking down a filesystem corruption issue with one proposed
> xfstests testcase. After some debugging I've found out that the problem
> actually is not in the kernel but rather in e2fsck (libext2fs
> respectively). The testcase deletes lost+found, e2fsck recreates it. But
> after the testcase / is h-tree directory. So ext2fs_link() creates
> lost+found in / and clears EXT4_INODE_INDEX flag. Now because the
> filesystem has metadata checksums, clearing the index flag needs to also
> rewrite all directory blocks with h-tree index blocks because the layout
> now needs to be different. ext2fs_link() actually tries to do this in its
> link_proc() but if the space for new directory entry is found before we
> walk all the h-tree index blocks, we terminate the iteration and some index
> blocks remain unconverted resulting in checksum errors and other weirdness
> later on.
Ouch. Nice catch.
I suspect we have a similar problem when the kernel discovers that the
htree is corrupted; it will clear EXT4_INODE_INDEX flag, since before
how we added metadata checksum, it was safe to do that. Given our
choice of where the stash the checksum, in order to not decrease the
fanout of htree directories, this is no no longer safe to do.
I think what we will need to do is to set an in-memory "fallback"
flag, which simply ignores the index blocks, but doesn't actually clear
the EXT4_INODE_INDEX flag in the case where metadata_csum is enabled.
> The question is how to best fix this. The easiest fix is to just make
> link_proc() iterate through all directory blocks when it needs to do the
> index blocks conversion. But this seems somewhat stupid. Also there's
> another problem with clearing EXT4_INODE_INDEX in ext2fs_link() - if the
> directory has more than 65000 subdirectories, clearing EXT4_INODE_INDEX is
> not allowed because large directory link count handling is supported only
> for EXT4_INODE_INDEX directories.
>
> So what do we do with this? For e2fsck, we could just link the new entry
> into the directory and force rehashing later. But ext2fs_link() can be
> called also from other tools and it should be a self-contained API... Any
> ideas? Should we just bite the bullet and implement ext2fs_link() for
> h-tree dirs properly?
Yes, I think that's what we are going to need to do; we had cheated
and not bothered to implement htree support in ext2fs library, but if
we're going to implement it for link_proc, we might as well also
implement it for lookups as well.
- Ted
Powered by blists - more mailing lists