linux-ext4 - Re: [PATCH] jbd2: Set the right uuid for block tag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <tencent_2388D8723F1AA7A28CD89C137C0FFCDE1806@qq.com>
Date:   Tue, 19 Jul 2022 00:08:52 +0800
From:   Wang Jianjian <wangjianjian0@...mail.com>
To:     Theodore Ts'o <tytso@....edu>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: [PATCH] jbd2: Set the right uuid for block tag

Gotcha, Thanks for your detailed explanation!

On 7/16/22 01:43, Theodore Ts'o wrote:
> On Fri, Jul 15, 2022 at 11:11:23PM +0800, Wang Jianjian wrote:
>> Hi all,
>> Is this a real problem need to fix ?
>>
>> On 7/12/22 00:26, Wang Jianjian wrote:
>>> journal->j_uuid is not initialized and let us use the uuid from
>>> j_superblock. And since this is the only place where j_uuid is used
>>> so that we can remove it.
> 
> There's a really long story, and the short version is, we don't really
> need to do anything.
> 
> The longer version of the story is that journal->j_uuid is not
> supposed to be the uuid from the journal superblock, but rather the
> uuid from the file system.  Quoting from the jbd2.h header file:
> 
> 	/**
> 	 * @j_uuid:
> 	 *
> 	 * Journal uuid: identifies the object (filesystem, LVM volume etc)
> 	 * backed by this journal.  This will eventually be replaced by an array
> 	 * of uuids, allowing us to index multiple devices within a single
> 	 * journal and to perform atomic updates across them.
> 	 */
> 	__u8			j_uuid[16];
> 
> The original design goal from Stephen Tweedie, who implemented the jbd
> subsystem for ext3, was that the jbd/jbd2 layer could in theory be
> used for more than just ext3/ext4 (and indeed, it is used by
> ocfs/ocfs2), *and*, that in the case of a journal on an external
> device, that a single jbd/jbd2 journal could support multiple file
> systems.  So you might have a dozen HDD's in a NAS box, and they would
> all use a single extrenal device as a journal.
> 
> So at the beginning of each block (or revoke) tag, there is a space
> for a UUID to indicate what file system that the metadata blocks being
> journaled was for.  Those bits are skipped if the JBD2_FLAG_SAME_UUID
> is set, in which case the tag is assumed to belong to the same file
> system as the previous tag.
> 
> The on-disk space has been reserved for this design; we have
> JBD2_FLAG_SAME_UUID, and that's why we copy the j_uuid into the tag
> block for the first tag, and all other tags gave the SAME_UUID flag
> set.  In addition, in the on-disk superblock, we define an array of
> UUID's which are the file systems which use this particular external
> journal:
> 
> /* 0x0100 */
> 	__u8	s_users[16*48];		/* ids of all fs'es sharing the log */
> 
> However, full support for having multiple file systems sharing the
> long was never realized.  Part of the reason for this is there are
> complications if one of the disks is off-line at the time that the
> journal is replayed.  Suppose out of the dozen file systems sharing an
> external journal, one of the HDD's is temporarily off-line, or removed
> from the NAS box.  Now we can't replay the journal, since there is no
> place to put the journal enrties for the missing file system(s).  In
> theory we could copy the journal entries for the missing file system
> in an file, but then we would have to define some place on the root
> file system where the saved journals for the missing HDD could be
> found, and that assumes that the root file system is mounted
> read/write so it's available to save the journal information.
> 
> Another potential problem is that when we *do* need to do a commit, we
> have to wait for handles for *all* of the file systems using that
> journal to complete, and so an fsync() triggered on one file system
> would potentially hold up file system operations on a sibling device.
> 
> There might be cases where this would make sense, they would be pretty
> specialized situations, and so never has ever decided to implement the
> rest of the feature.
> 
> At the moment journal->j_uuid is all zeros after
> journal_init_common(), and so we're wasting 16 bytes in each journal
> descriptor block by filling in those bytes with all zeroes.  The
> proper way to "fix" this would be in fs/ext4/super.c, in the functions
> ext4_get_inode() and ext4_get_dev_journal(), after those functions
> call jbd2_journal_init_{dev,inode}, they should copy
> EXT4_SB(super)->s_uuid into journal->j_uuid.
> 
> That would properly "initialize" journal->j_uuid, and it would mean
> that the file system uuid would be in the tag block.  One potential
> gotcha if we were to make this change, and if we wanted to actually
> check the uuid in the jbd2 descriptor block, is that today, tune2fs,
> and the proposed EXT4_IOC_SETUUID ioctl assume that we can change the
> file system uuid without having any potential problems.
> 
> If we wanted to properly support this case in the EXT4_IOC_SETUUID
> patch, we would have to freeze the file system by calling
> ext4_freeze() and then update journal->j_uuid, as well as update
> jsb->s_users[] if we are using an external journal.
> 
> And for the existing userspace-only "tune2fs -U" code, we currently
> don't have a way of triggering an update of the journal->j_uuid field
> when the file system uuid is changed.  (We do update the journal
> superblock s_users array, but if we did that while the file system was
> mounted, it wouldn't be safe unless it called FIFREEZE/FITHAW, which
> it currently doesn't do.)
> 
> So we could initialize journal->j_uuid if we wanted to, which wouldn't
> do much harm, but it wouldn't really fix anything either --- and then
> we'd have to make sure that journal is quicesed, and journal->j_uuid
> gets updated if the file system UUID is changed while the file system
> is mounted.  Once the first tag in the jbd2 descriptor block is now
> set correctly, we would then have to find a profitable way to *use*
> that information --- either as an additional sanity check on top of
> the checksum, or to implement the full support for shared uses for the
> journal.  But since the journal checksums are good enough that we
> don't need the additional validation, and there hasn't been a real
> demand for shared external journals, it probably won't happen until
> someone wants to make a business case and fund the engineering work to
> make it happen.
> 
> Cheers,
> 
> 						- Ted
> 						
>