[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141128162445.GB27902@quack.suse.cz>
Date: Fri, 28 Nov 2014 17:24:45 +0100
From: Jan Kara <jack@...e.cz>
To: Theodore Ts'o <tytso@....edu>
Cc: Jan Kara <jack@...e.cz>,
Linux Filesystem Development List
<linux-fsdevel@...r.kernel.org>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Linux btrfs Developers List <linux-btrfs@...r.kernel.org>,
XFS Developers <xfs@....sgi.com>
Subject: Re: [PATCH-v4 2/7] vfs: add support for a lazytime mount option
On Thu 27-11-14 18:00:16, Ted Tso wrote:
> On Thu, Nov 27, 2014 at 02:14:21PM +0100, Jan Kara wrote:
> > * change queue_io() to also call
> > moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io, time + 24hours)
> > For this you need to tweak move_expired_inodes() to take pointer to
> > timestamp instead of pointer to work but that's trivial. Also you want
> > probably leave time ->older_than_this value (i.e. without +24 hours) if
> > we are doing WB_SYNC_ALL writeback. With this you can remove
> > flush_sb_dirty_time() completely.
>
> Well.... it's not quite enough. The problem is that for ext3 and
> ext4, the actual work of writing the inode happens in dirty_inode(),
> not in write_inode(). Which means we need to do something like this.
Right, I didn't realize this problem.
> I'm not entirely sure whether or not this is too ugly to live;
> personally, I think my hack of handling this in update_time() might be
> preferable....
Actually handling the copying of timestamps in __writeback_single_inode()
would look fine to me. You mention in your next email, calling
mark_inode_dirty_sync() from flusher may be problematic - why? How is this
any different from calling mark_inode_dirty_sync() from
flush_sb_dirty_time()?
I will note that for a while I thought copying the full inode to on-disk
buffer may be problematic because inode may be in an intermediate state of
some transactional change. But that isn't an issue - if there's any
transactional change in progress, it has a handle open and until the
change is node, thus the buffer with the partial change cannot go to the
journal (transaction cannot commit) until mark_inode_dirty_sync() copies
the final state of the inode.
Another solution may be to convey the information that copying of timestamps
is necessary to ->write_inode method. We could do that via a
flag bit in writeback_control. Each filesystem can then copy timestamps
when this bit is set. But calling mark_inode_dirty_sync() from
__writeback_single_inode() looks simpler to me.
Honza
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index b93c529..95a42b3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -253,7 +253,7 @@ static bool inode_dirtied_after(struct inode *inode, unsigned long t)
> */
> static int move_expired_inodes(struct list_head *delaying_queue,
> struct list_head *dispatch_queue,
> - struct wb_writeback_work *work)
> + unsigned long *older_than_this)
> {
> LIST_HEAD(tmp);
> struct list_head *pos, *node;
> @@ -264,8 +264,8 @@ static int move_expired_inodes(struct list_head *delaying_queue,
>
> while (!list_empty(delaying_queue)) {
> inode = wb_inode(delaying_queue->prev);
> - if (work->older_than_this &&
> - inode_dirtied_after(inode, *work->older_than_this))
> + if (older_than_this &&
> + inode_dirtied_after(inode, *older_than_this))
> break;
> list_move(&inode->i_wb_list, &tmp);
> moved++;
> @@ -309,9 +309,14 @@ out:
> static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work)
> {
> int moved;
> + unsigned long one_day_later = jiffies + (HZ * 86400);
> +
> assert_spin_locked(&wb->list_lock);
> list_splice_init(&wb->b_more_io, &wb->b_io);
> - moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, work);
> + moved = move_expired_inodes(&wb->b_dirty, &wb->b_io,
> + work->older_than_this);
> + moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io,
> + &one_day_later);
> trace_writeback_queue_io(wb, work, moved);
> }
>
> @@ -637,6 +642,17 @@ static long writeback_sb_inodes(struct super_block *sb,
> }
>
> /*
> + * If the inode is marked dirty time but is not dirty,
> + * then at last for ext3 and ext4 we need to call
> + * mark_inode_dirty_sync in order to get the inode
> + * timestamp transferred to the on disk inode, since
> + * write_inode is a no-op for those file systems. HACK HACK HACK
> + */
> + if ((inode->i_state & I_DIRTY_TIME) &&
> + ((inode->i_state & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) == 0))
> + mark_inode_dirty_sync(inode);
> +
> + /*
> * Don't bother with new inodes or inodes being freed, first
> * kind does not need periodic writeout yet, and for the latter
> * kind writeout is handled by the freer.
> @@ -1233,9 +1249,10 @@ void inode_requeue_dirtytime(struct inode *inode)
> spin_lock(&bdi->wb.list_lock);
> spin_lock(&inode->i_lock);
> if ((inode->i_state & I_DIRTY_WB) == 0) {
> - if (inode->i_state & I_DIRTY_TIME)
> + if (inode->i_state & I_DIRTY_TIME) {
> + inode->dirtied_when = jiffies;
> list_move(&inode->i_wb_list, &bdi->wb.b_dirty_time);
> - else
> + } else
> list_del_init(&inode->i_wb_list);
> }
> spin_unlock(&inode->i_lock);
>
> Comments?
>
> - Ted
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists