lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180603221426.GG1750@thunk.org>
Date:   Sun, 3 Jun 2018 18:14:26 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Maarten van Malland <maartenvanmalland@...il.com>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: Problem with external journal and LVM snapshots

On Fri, Jun 01, 2018 at 12:47:05PM +0200, Maarten van Malland wrote:
> I have a not so common setup that IMHO triggers a bug in the Ext4 journal code. I have the following setup:
> 
> - A mdadm RAID10 device with Bcache backing and LVM on top. This should actually not matter at all, but perhaps still worth mentioning.
> - The Ext4 volume resides on a LVM VG, with an external journal on a NVMe drive.
> - I use LVM snapshotting for that volume
> 
> Now, when I make the snapshot I do the following:
> 
> lvremove /dev/bcache/root-snap
> lvcreate -c 512 -I 512 -n root-snap -L 250G -s /dev/bcache/root
> tune2fs -O ^has_journal /dev/bcache/root-snap (to get rid of the external journal)
> tune2fs -O has_journal /dev/bcache/root-snap (to create a new internal journal)
> 
> When finished, I can mount /dev/bcache/root-snap just fine, with the
> internal journal working. However, when I reboot it's a different
> issue. For whatever reason the kernel still sees both
> /dev/bcache/root and /dev/bcache/root-snap with an external journal!

I suspect that's not what is going on.  The problem is that external
journals predate snapshot support, and external journals aren't very
well supported in the first place, because so few people use them.

The other thing to understand about external journals is that both the
external journal and the file system each have a UUID, and the file
system superblock, in addition to its UUID, has the UUID for the
external journal which is it using.  And the external journal, in
addition to its UUID, has a list of UUID's for the file systems that
is using the external journal.  (There is partial support to allow
multiple file systems to use the same journal; which was never
completed.)

So when you created the snapshot:

  lvremove /dev/bcache/root-snap
  lvcreate -c 512 -I 512 -n root-snap -L 250G -s /dev/bcache/root

This created a new block device which had the same file system UUID as
the orignal file system.  When you then attempted to remove the
external journal:

  tune2fs -O ^has_journal /dev/bcache/root-snap

... this cleared the external journal's UUID from
/dev/bcache/root-snap.  However, this *also* removed the UUID of
/dev/bcache/root and /dev/bcache/root-snap from the external journal.

This was fine while /dev/bcache/root remains mounted.  But then when
you next tried to remount /deb/bcache/root, the mount would have
failed, because while /deb/bcache/root has a pointer (via a UUID) to
the external journal, the external journal no longer has a
back-pointer (via UUID) to /dev/bcache/root.

You didn't say what the script in initrd was that fixed it, but I'm
guessing it was something like:

   tune2fs -O ^has_journal /dev/bcache/root

Which would have resulted in the warning message:

tune2fs 1.44.2 (14-May-2018)
Filesystem's UUID not found on journal device.  <======
Journal removed

Followed by something like:

   tune2fs -J device=/dev/bcache/journal /deb/bcache/root


The fundamental problem is that there is deep assumption that file
system UUID's are unique.  This is needed for mounting-by-uuid to
work, for example.  Creating snapshots which aren't emphameral breaks
this assumption so it's not just external journals which have this
problem.  If you have "UUID=xxxx" in your /etc/fstab, it's going to
cause confusion as well.

So the quick workaround for your problem is to use this instead of
"tune2fs -O ^has_journal /dev/bcache/root-snap":

debugfs -w /deb/bcache/root-snap << EOF
features ^has_journal
set_super_value journal_uuid null
set_super_value journal_dev 0
quit
EOF

Regards,

					- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ