[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180603221426.GG1750@thunk.org>
Date: Sun, 3 Jun 2018 18:14:26 -0400
From: "Theodore Y. Ts'o" <tytso@....edu>
To: Maarten van Malland <maartenvanmalland@...il.com>
Cc: linux-ext4@...r.kernel.org
Subject: Re: Problem with external journal and LVM snapshots
On Fri, Jun 01, 2018 at 12:47:05PM +0200, Maarten van Malland wrote:
> I have a not so common setup that IMHO triggers a bug in the Ext4 journal code. I have the following setup:
>
> - A mdadm RAID10 device with Bcache backing and LVM on top. This should actually not matter at all, but perhaps still worth mentioning.
> - The Ext4 volume resides on a LVM VG, with an external journal on a NVMe drive.
> - I use LVM snapshotting for that volume
>
> Now, when I make the snapshot I do the following:
>
> lvremove /dev/bcache/root-snap
> lvcreate -c 512 -I 512 -n root-snap -L 250G -s /dev/bcache/root
> tune2fs -O ^has_journal /dev/bcache/root-snap (to get rid of the external journal)
> tune2fs -O has_journal /dev/bcache/root-snap (to create a new internal journal)
>
> When finished, I can mount /dev/bcache/root-snap just fine, with the
> internal journal working. However, when I reboot it's a different
> issue. For whatever reason the kernel still sees both
> /dev/bcache/root and /dev/bcache/root-snap with an external journal!
I suspect that's not what is going on. The problem is that external
journals predate snapshot support, and external journals aren't very
well supported in the first place, because so few people use them.
The other thing to understand about external journals is that both the
external journal and the file system each have a UUID, and the file
system superblock, in addition to its UUID, has the UUID for the
external journal which is it using. And the external journal, in
addition to its UUID, has a list of UUID's for the file systems that
is using the external journal. (There is partial support to allow
multiple file systems to use the same journal; which was never
completed.)
So when you created the snapshot:
lvremove /dev/bcache/root-snap
lvcreate -c 512 -I 512 -n root-snap -L 250G -s /dev/bcache/root
This created a new block device which had the same file system UUID as
the orignal file system. When you then attempted to remove the
external journal:
tune2fs -O ^has_journal /dev/bcache/root-snap
... this cleared the external journal's UUID from
/dev/bcache/root-snap. However, this *also* removed the UUID of
/dev/bcache/root and /dev/bcache/root-snap from the external journal.
This was fine while /dev/bcache/root remains mounted. But then when
you next tried to remount /deb/bcache/root, the mount would have
failed, because while /deb/bcache/root has a pointer (via a UUID) to
the external journal, the external journal no longer has a
back-pointer (via UUID) to /dev/bcache/root.
You didn't say what the script in initrd was that fixed it, but I'm
guessing it was something like:
tune2fs -O ^has_journal /dev/bcache/root
Which would have resulted in the warning message:
tune2fs 1.44.2 (14-May-2018)
Filesystem's UUID not found on journal device. <======
Journal removed
Followed by something like:
tune2fs -J device=/dev/bcache/journal /deb/bcache/root
The fundamental problem is that there is deep assumption that file
system UUID's are unique. This is needed for mounting-by-uuid to
work, for example. Creating snapshots which aren't emphameral breaks
this assumption so it's not just external journals which have this
problem. If you have "UUID=xxxx" in your /etc/fstab, it's going to
cause confusion as well.
So the quick workaround for your problem is to use this instead of
"tune2fs -O ^has_journal /dev/bcache/root-snap":
debugfs -w /deb/bcache/root-snap << EOF
features ^has_journal
set_super_value journal_uuid null
set_super_value journal_dev 0
quit
EOF
Regards,
- Ted
Powered by blists - more mailing lists