[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081217031210.GA17588@mit.edu>
Date: Tue, 16 Dec 2008 22:12:10 -0500
From: Theodore Tso <tytso@....edu>
To: Frank Mayhar <fmayhar@...gle.com>
Cc: linux-ext4@...r.kernel.org
Subject: Re: [PATCH 1/1] Allow ext4 to run without a journal.
On Wed, Dec 10, 2008 at 03:54:30PM -0800, Frank Mayhar wrote:
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index b455c68..3a05ae7 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -19,6 +19,8 @@
> #include <linux/jbd2.h>
> #include "ext4.h"
>
> +#define EXT4_NOJOURNAL_MAGIC (void *)0x457874344e6a6e6c /* "Ext4Njnl" */
> +
This will cause warnings on systems where pointers are 32-bits. I'd
suggest using a 32-bit magic number. Also, the number you have picked
is divisble by 4, which means at least in theory it could be a valid
pointer. If we make EXT4_NOJOURNAL_MAGIC a 32-bit number, then a
number which is divisible by 4 definitely could be a valid pointer on
a 64-bit system. I'd suggest keeping it simple, and simply using 0x1.
This is known not be a valid pointer, since it's not 32-bit word
aligned, and there is precedent for using such a number --- for
example, SIG_IGN is ((void *) 1) on many architectures, and page 0 is
generally unmapped to catch null pointer dereferences. So I'd suggest
keeping things simple instead of using a fancy char string.
Also, why not use EXT4_NOJOURNAL_MAGIC constant directly, instead of
assigning it as a constant value to s_nojournal_flag in struct
ext4_sb_info? That bloats the data structure for no good reason that
I can tell.
> +static inline int ext4_handle_dirty_metadata(handle_t *handle,
> + struct inode *inode, struct buffer_head *bh)
> +{
> + int err;
> +
> + if (ext4_handle_valid(handle)) {
> + err = __ext4_journal_dirty_metadata(__func__, handle, bh);
> + } else {
> + err = __ext4_write_dirty_metadata(inode, bh);
> + }
> + return err;
> +}
I don't see the point in doing this as an inline function; either way,
each time the inline function is executed, it will result a function
call, either to __ext4_journal_dirty_metadata() or
__ext4_write_dirty_metadata(), both of which are called exactly once
by this inline function. So why not move the conditional and the
contents of _ext4_write_dirty_metadata() and
__ext4_journal_dirty_metadata() into __ext4_handle_dirty_metadata(),
and then do this:
#define ext4_handle_dirty_metadata(handle, inode, bh) \
__ext4_handle_dirty_metadata(__func__, handle, inode, bh)
Ext4_handle_dirty_metadata() gets called a *lot*, so this will shrink
the icache footprint of ext4 filesystem, and for CPU's that have
decent branch prediction, centralizing the condition into a single
location instead of an inline will also be a win, once again
demonstrating that with modern CPU's, optimizations that minimize code
size often also faster.
> /* super.c */
> @@ -208,6 +270,9 @@ int ext4_force_commit(struct super_block *sb);
>
> static inline int ext4_should_journal_data(struct inode *inode)
> {
> + BUG_ON(EXT4_JOURNAL(inode) == NULL &&
> + (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
> + EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL));
> if (!S_ISREG(inode->i_mode))
> return 1;
> if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
This shouldn't be a BUG_ON; this gets triggered if there is an inode
which is marked as needing data journalling. So it is not a *bug*,
but at worst either a filesystem corruption, or more charitably a
conflict between a mount option and a setting in an inode flag. In
any case, these should not cause a BUG; at worst an ext4_error(), or
ext4_warning(). However, today, if a filesystem with inodes
requesting data journalling are mounted on ext2, the request is simply
ignored. So I'd argue that a similar treatmeant should be used for
ext4 filesystem running w/o a journal. At worst perhaps there should
be a one-time ext4_warning(), but I'm not convinced even this is
warranted. If the system administrator mounts a filesystem without a
journal, it's pretty clear what was meant.
> @@ -219,6 +284,8 @@ static inline int ext4_should_journal_data(struct inode *inode)
>
> static inline int ext4_should_order_data(struct inode *inode)
> {
> + BUG_ON(EXT4_JOURNAL(inode) == NULL &&
> + EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL);
See above discussion.
> static inline int ext4_should_writeback_data(struct inode *inode)
> {
> + BUG_ON(EXT4_JOURNAL(inode) == NULL &&
> + EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL);
Again, see above.
> --- a/fs/ext4/ext4_sb.h
> +++ b/fs/ext4/ext4_sb.h
> @@ -28,6 +28,7 @@
> * fourth extended-fs super-block data in memory
> */
> struct ext4_sb_info {
> + int *s_nojournal_flag; /* Null to indicate "not a handle" */
> unsigned long s_desc_size; /* Size of a group descriptor in bytes */
See above discussion; I don't understand why this is necessary, and
why it is an int * here, especially given that it is assigned a
constant value which is a (void *), and it is eventually cast into a
(handle_t *).
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e4a241c..e0f433c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2284,7 +2314,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
printk(KERN_ERR
"ext4: No journal on filesystem on %s\n",
sb->s_id);
- goto failed_mount3;
+ clear_opt(sbi->s_mount_opt, DATA_FLAGS);
+ set_opt(sbi->s_mount_opt, WRITEBACK_DATA);
+ sbi->s_journal = NULL;
+ es->s_state &= cpu_to_le16(~EXT4_VALID_FS);
+ needs_recovery = 0;
+ goto no_journal;
}
In the case where there is no journal, please remove the KERN_ERR
printk; that's just excess noise, and some day, Google System
Administrators will thank you for getting rid of it. :-)
@@ -2748,6 +2798,9 @@ static int ext4_create_journal(struct super_block *sb,
EXT4_SB(sb)->s_journal = journal;
+ if (journal_inum == 0)
+ return 0;
+
ext4_update_dynamic_rev(sb);
EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
EXT4_SET_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL);
If journal_inum == 0, ext4_create_journal will never be called (see
line 2429 in ext4fill_super), so this check is not really necessary.
Actually, what we should probably do (as a separate patch before this
one), is to remove ext4_create_journal() and the journal_inum mount
option altogether. It's pointless code that was originally added to
ext3 by Stephen Tweedie back when he wanted to convert ext2
filesystems to ext3 before e2fsprogs supported tune2fs -j. It's been
obsolete for over eight years, in both ext3 and ext4.
In any case, these are all pretty tiny nits; on the whole I think this
patch is quite clean, so I'll add it to the ext4 patch queue for
testing. I would appreciate getting an updated patch which addresses
these suggested cleanups, though. Also, although I haven't tested it,
I suspect we need to add a check so that if there is no journal, and
the EXT4_FEATURE_INCOMPAT_RECOVERY feature is set, the mount should be
aborted with an error, instead of just clearning the recovery flag and
moving on. In actual practice, such a combination should never
happen, but if it does, failing the mount is probably a safer thing
to do.
Regards,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists