[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20131202083459.GA2305@quack.suse.cz>
Date: Mon, 2 Dec 2013 09:34:59 +0100
From: Jan Kara <jack@...e.cz>
To: Ross Zwisler <ross.zwisler@...ux.intel.com>
Cc: andreas.dilger@...el.com, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] ext4: Add XIP functionality
Hello,
On Mon 18-11-13 14:51:32, Ross Zwisler wrote:
> This is a port of the XIP functionality found in the current version of
> ext2. This patch set is intended to achieve feature parity with XIP in
> ext2 rather than non-XIP in ext4. In particular, it lacks support for
> splice and AIO. We'll be submitting patches in the future to add that
> functionality, but we think this is a good start.
>
> There are also a couple of bugs that also appear in ext2 around handling
> of the xip mount option; we're currently investigating and will submit
> patches to fix both in ext2 and ext4, but didn't want to delay getting
> this patch out for comment.
>
> The motivation behind this work is that we believe that the XIP feature
> will begin to find new uses as various persistent memory devices and
> technologies come on to the market. Having direct, byte-addressable
> access to persistent memory without having an additional copy in the
> page cache can be a win in terms of I/O latency and overall memory
> usage.
Yes, I believe implementing XIP in ext4 is desirable. It is the only
ext2 feature I'm aware of that is missing from ext4.
> This patch applies cleanly to v3.12, and was tested using brd as our
> block driver.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
> Reviewed-by: Andreas Dilger <andreas.dilger@...el.com>
> ---
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e274e9c..dea66bb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
...
> @@ -4645,11 +4673,19 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> } else
> ext4_wait_for_tail_page_commit(inode);
> }
> - /*
> - * Truncate pagecache after we've waited for commit
> - * in data=journal mode to make pages freeable.
> - */
> +
> + if (mapping_is_xip(inode->i_mapping)) {
> + error = xip_truncate_page(inode->i_mapping,
> + inode->i_size);
> + if (error)
> + goto err_out;
> + } else {
> + /*
> + * Truncate pagecache after we've waited for commit
> + * in data=journal mode to make pages freeable.
> + */
> truncate_pagecache(inode, inode->i_size);
> + }
> }
> /*
> * We want to call ext4_truncate() even if attr->ia_size ==
Umm, much more logical place for this would be in ext4_truncate() at the
place where we do ext4_block_truncate_page(). Because xip_truncate_page()
does what ext4_block_truncate_page() does.
Also thinking about it for a while you must call truncate_pagecache() in
XIP mode as well to unmap PTEs removed by truncate. In ext2 this is hidden
in truncate_setsize() call...
Also you seem to be missing any hole punching support at all. For that
you'd need to modify xip_truncate_page() to accept not only offset but also
length of the truncate area (a separate patch please). And then you will
need to use that function from ext4_punch_hole() at the place where
ext4_zero_partial_blocks() is used.
Finally, as Matthew Wilcox pointed out
(http://www.spinics.net/lists/linux-fsdevel/msg70582.html) there's a race
between truncate and mmap in xip support because xip is missing
serialization on page locks. So I believe we should solve that when we are
growing XIP support in another filesystem... Probably using mmap_sem for
that might be viable but I have to try.
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 2c2e6cb..18e70d2 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
...
> @@ -3525,11 +3532,19 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> }
> if (test_opt(sb, DELALLOC))
> clear_opt(sb, DELALLOC);
> + if (test_opt(sb, XIP)) {
> + ext4_msg(sb, KERN_ERR, "can't mount with "
> + "both data=journal and xip");
> + goto failed_mount;
> + }
> }
>
> sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
> (test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);
>
> + ext4_xip_verify_sb(sb); /* see if bdev supports xip, unset
> + EXT4_MOUNT_XIP if not */
> +
I don't like clearing the flag inside this function. Just opencode the
function here please (I don't think the other call site at ext4_remount()
makes sense at all).
> if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
> (EXT4_HAS_COMPAT_FEATURE(sb, ~0U) ||
> EXT4_HAS_RO_COMPAT_FEATURE(sb, ~0U) ||
> @@ -3576,6 +3591,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> goto failed_mount;
> }
>
> + if (ext4_use_xip(sb) && blocksize != PAGE_SIZE) {
> + if (!silent)
> + ext4_msg(sb, KERN_ERR,
> + "error: unsupported blocksize for xip");
> + goto failed_mount;
> + }
> +
> if (sb->s_blocksize != blocksize) {
> /* Validate the filesystem blocksize */
> if (!sb_set_blocksize(sb, blocksize)) {
> @@ -4707,6 +4729,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> struct ext4_super_block *es;
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> unsigned long old_sb_flags;
> + unsigned long old_mount_opt = sbi->s_mount_opt;
> struct ext4_mount_options old_opts;
> int enable_quota = 0;
> ext4_group_t g;
> @@ -4773,7 +4796,23 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
> (test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);
>
> + ext4_xip_verify_sb(sb); /* see if bdev supports xip, unset
> + EXT4_MOUNT_XIP if not */
> +
> + if (ext4_use_xip(sb) && sb->s_blocksize != PAGE_SIZE) {
> + ext4_msg(sb, KERN_WARNING,
> + "warning: unsupported blocksize for xip");
> + err = -EINVAL;
> + goto restore_opts;
> + }
> +
> es = sbi->s_es;
> + if ((sbi->s_mount_opt ^ old_mount_opt) & EXT4_MOUNT_XIP) {
> + ext4_msg(sb, KERN_WARNING, "warning: refusing change of "
> + "xip flag with busy inodes while remounting");
> + sbi->s_mount_opt &= ~EXT4_MOUNT_XIP;
> + sbi->s_mount_opt |= old_mount_opt & EXT4_MOUNT_XIP;
> + }
So why do you bother with ext4_xip_verify_sb() and other stuff when you
disallow remount to change xip flag anyway (which I think makes sense)?
> if (sbi->s_journal) {
> ext4_init_journal_params(sb, sbi->s_journal);
> diff --git a/fs/ext4/xip.c b/fs/ext4/xip.c
> new file mode 100644
> index 0000000..e0a430a
> --- /dev/null
> +++ b/fs/ext4/xip.c
> @@ -0,0 +1,91 @@
> +/*
> + * linux/fs/ext4/xip.c
> + *
> + * Copyright (C) 2005 IBM Corporation
> + * Author: Carsten Otte (cotte@...ibm.com)
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/fs.h>
> +#include <linux/genhd.h>
> +#include <linux/buffer_head.h>
> +#include <linux/blkdev.h>
> +#include "ext4.h"
> +#include "xip.h"
> +
> +static inline int
> +__inode_direct_access(struct inode *inode, sector_t block,
> + void **kaddr, unsigned long *pfn)
> +{
> + struct block_device *bdev = inode->i_sb->s_bdev;
> + const struct block_device_operations *ops = bdev->bd_disk->fops;
> + sector_t sector;
> +
> + sector = block * (PAGE_SIZE / 512); /* ext4 block to bdev sector */
> +
> + BUG_ON(!ops->direct_access);
> + return ops->direct_access(bdev, sector, kaddr, pfn);
> +}
> +
> +static inline int
> +__ext4_get_block(struct inode *inode, pgoff_t pgoff, int create,
> + sector_t *result)
> +{
> + struct buffer_head tmp;
> + int rc;
> +
> + memset(&tmp, 0, sizeof(struct buffer_head));
> + tmp.b_size = inode->i_sb->s_blocksize;
> + rc = ext4_get_block(inode, pgoff, &tmp, create);
> + *result = tmp.b_blocknr;
Please use ext4_map_blocks() directly. There's no need to go via
ext4_get_block() with its suboptimal buffer_head interface...
> + /* did we get a sparse block (hole in the file)? */
> + if (!tmp.b_blocknr && !rc) {
> + BUG_ON(create);
> + rc = -ENODATA;
> + }
> +
> + return rc;
> +}
> +
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists