lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Mar 2009 10:13:03 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Jeff Layton <jlayton@...hat.com>
Cc:	Ian Kent <raven@...maw.net>, Dave Chinner <david@...morbit.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"hch@...radead.org" <hch@...radead.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>
Subject: Re: [PATCH] writeback: reset inode dirty time when adding it back
	to empty s_dirty list

On Fri, Mar 27, 2009 at 01:03:27AM +0800, Jeff Layton wrote:
> On Wed, 25 Mar 2009 22:16:18 +0800
> Wu Fengguang <fengguang.wu@...el.com> wrote:
> 
> > > 
> > > Actually, I think you were right. We still have this check in
> > > generic_sync_sb_inodes() even with Wu's January 2008 patches:
> > > 
> > > 	/* Was this inode dirtied after sync_sb_inodes was called? */
> > > 	if (time_after(inode->dirtied_when, start))
> > > 		break;
> > 
> > Yeah, ugly code. Jens' per-bdi flush daemons should eliminate it...
> > 
> 
> I had a look over Jens' patches and they seem to be more concerned with
> how the queues and daemons are organized (per-bdi rather than per-sb).
> The actual way that inodes flow between the queues and get written out
> don't look like they really change with his set.

OK, sorry that I've not carefully reviewed the per-bdi flushing patchset.

> They also don't eliminate the problematic check above. Regardless of
> whether your or Jens' patches make it in, I think we'll still need
> something like the following (untested) patch.
> 
> If this looks ok, I'll flesh out the comments some and "officially" post
> it. Thoughts?

It's good in itself. However with more_io_wait queue, the first two
chunks will be eliminated. Mind I carry this patch with my patchset?

Thanks,
Fengguang


> --------------[snip]-----------------
> 
> >From d10adff2d5f9a15d19c438119dbb2c410bd26e3c Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@...hat.com>
> Date: Thu, 26 Mar 2009 12:54:52 -0400
> Subject: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks
> 
> The dirtied_when value on an inode is supposed to represent the first
> time that an inode has one of its pages dirtied. This value is in units
> of jiffies. This value is used in several places in the writeback code
> to determine when to write out an inode.
> 
> The problem is that these checks assume that dirtied_when is updated
> periodically. But if an inode is continuously being used for I/O it can
> be persistently marked as dirty and will continue to age. Once the time
> difference between dirtied_when and the jiffies value it is being
> compared to is greater than (or equal to) half the maximum of the
> jiffies type, the logic of the time_*() macros inverts and the opposite
> of what is needed is returned. On 32-bit architectures that's just under
> 25 days (assuming HZ == 1000).
> 
> As the least-recently dirtied inode, it'll end up being the first one
> that pdflush will try to write out. sync_sb_inodes does this check
> however:
> 
> 	/* Was this inode dirtied after sync_sb_inodes was called? */
>  	if (time_after(inode->dirtied_when, start))
>  		break;
> 
> ...but now dirtied_when appears to be in the future. sync_sb_inodes
> bails out without attempting to write any dirty inodes.  When this
> occurs, pdflush will stop writing out inodes for this superblock and
> nothing will unwedge it until jiffies moves out of the problematic
> window.
> 
> This patch fixes this problem by changing the time_after checks against
> dirtied_when to also check whether dirtied_when appears to be in the
> future. If it does, then we consider the value to be in the past.
> 
> This should shrink the problematic window to such a small period as not
> to matter.
> 
> Signed-off-by: Jeff Layton <jlayton@...hat.com>
> ---
>  fs/fs-writeback.c |   11 +++++++----
>  1 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e3fe991..dba69a5 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -196,8 +196,9 @@ static void redirty_tail(struct inode *inode)
>  		struct inode *tail_inode;
>  
>  		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
> -		if (!time_after_eq(inode->dirtied_when,
> -				tail_inode->dirtied_when))
> +		if (time_before(inode->dirtied_when,
> +				tail_inode->dirtied_when) ||
> +		    time_after(inode->dirtied_when, jiffies))
>  			inode->dirtied_when = jiffies;
>  	}
>  	list_move(&inode->i_list, &sb->s_dirty);
> @@ -231,7 +232,8 @@ static void move_expired_inodes(struct list_head *delaying_queue,
>  		struct inode *inode = list_entry(delaying_queue->prev,
>  						struct inode, i_list);
>  		if (older_than_this &&
> -			time_after(inode->dirtied_when, *older_than_this))
> +			time_after(inode->dirtied_when, *older_than_this) &&
> +			time_before_eq(inode->dirtied_when, jiffies))
>  			break;
>  		list_move(&inode->i_list, dispatch_queue);
>  	}
> @@ -493,7 +495,8 @@ void generic_sync_sb_inodes(struct super_block *sb,
>  		}
>  
>  		/* Was this inode dirtied after sync_sb_inodes was called? */
> -		if (time_after(inode->dirtied_when, start))
> +		if (time_after(inode->dirtied_when, start) &&
> +		    time_before_eq(inode->dirtied_when, jiffies))
>  			break;
>  
>  		/* Is another pdflush already flushing this queue? */
> -- 
> 1.5.5.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ