lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 1 Apr 2009 14:56:18 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jeff Layton <jlayton@...hat.com>, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] writeback: guard against jiffies wraparound on
	inode->dirtied_when checks (try #2)

On Tue, Mar 31, 2009 at 06:07:30PM -0700, Andrew Morton wrote:
> On Tue, 31 Mar 2009 20:50:18 -0400 Jeff Layton <jlayton@...hat.com> wrote:
> 
> > On Tue, 31 Mar 2009 17:20:31 -0700
> > Andrew Morton <akpm@...ux-foundation.org> wrote:
> > 
> > > On Tue, 31 Mar 2009 20:03:59 -0400
> > > Jeff Layton <jlayton@...hat.com> wrote:
> > > 
> > > > +		 * It's not sufficient to just do a time_after() check on
> > > > +		 * dirtied_when. That assumes that dirtied_when will always
> > > > +		 * change within a period of jiffies that encompasses half the
> > > > +		 * machine word size (2^31 jiffies on 32-bit arch). That's not
> > > > +		 * necessarily the case if an inode is being constantly
> > > > +		 * redirtied. Since dirtied_when can never be in the future,
> > > > +		 * we can assume that if it appears to be so then it is
> > > > +		 * actually in the distant past.
> > > 
> > > so this really is a 32-bit-only thing.
> > > 
> > > I guess that isn't worth optimising for though.
> > > 
> > 
> > Yeah, it's pretty much impossible to hit this on a 64-bit machine.
> > 
> > > otoh, given that all three comparisons are the same:
> > > 
> > > +			time_after(inode->dirtied_when, *older_than_this) &&
> > > +			time_before_eq(inode->dirtied_when, jiffies))
> > > 
> > > (although one is inverted (i think?)), it might end up nicer if this was all done
> > > in a little helper function?
> > > 
> > > That way we only need to comment what's going on at a single site, and
> > > we could omit the additional test if !CONFIG_64BIT.
> > 
> > Ok, that seems reasonable.
> > 
> > At one point I had a macro similar to time_in_range(), but dropped it
> > primarily because time_after_but_before_eq() wasn't easy on the eyes.
> > Thoughts on better names?
> 
> I was thinking
> 
> 	bool inode_dirtied_after(...);
> 
> and just leave the innards using time_after() and time_before_eq()?

Andrew, here is the updated patch. Note that the first chunk for
redirty_tail() was not absolutely necessary and so removed.

Thanks,
Fengguang
---
Subject: writeback: guard against jiffies wraparound on inode->dirtied_when checks
From:	Jeff Layton <jlayton@...hat.com>

The dirtied_when value on an inode is supposed to represent the first
time that an inode has one of its pages dirtied. This value is in units
of jiffies. It's used in several places in the writeback code to
determine when to write out an inode.

The problem is that these checks assume that dirtied_when is updated
periodically. If an inode is continuously being used for I/O it can be
persistently marked as dirty and will continue to age. Once the time
difference between dirtied_when and the jiffies value it is being
compared to is greater than or equal to half the maximum of the jiffies
type, the logic of the time_*() macros inverts and the opposite of what
is needed is returned. On 32-bit architectures that's just under 25 days
(assuming HZ == 1000).

As the least-recently dirtied inode, it'll end up being the first one
that pdflush will try to write out. sync_sb_inodes() does this check:

	/* Was this inode dirtied after sync_sb_inodes was called? */
 	if (time_after(inode->dirtied_when, start))
 		break;

...but now dirtied_when appears to be in the future. sync_sb_inodes()
bails out without attempting to write any dirty inodes. When this
occurs, pdflush will stop writing out inodes for this superblock.
Nothing can unwedge it until jiffies moves out of the problematic
window.

Fix this problem by changing the checks against dirtied_when to also
check whether it appears to be in the future. If it does, then we
consider the value to be far in the past.

This should shrink the problematic window of time to such a small
period(30s) as not to matter.

Acked-by: Ian Kent <raven@...maw.net>
Signed-off-by: Jeff Layton <jlayton@...hat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
---
 fs/fs-writeback.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

--- mm.orig/fs/fs-writeback.c
+++ mm/fs/fs-writeback.c
@@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i
 		struct inode *tail_inode;
 
 		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
-		if (!time_after_eq(inode->dirtied_when,
+		if (time_before(inode->dirtied_when,
 				tail_inode->dirtied_when))
 			inode->dirtied_when = jiffies;
 	}
@@ -220,6 +220,21 @@ static void inode_sync_complete(struct i
 	wake_up_bit(&inode->i_state, __I_SYNC);
 }
 
+static bool inode_dirtied_after(struct inode *inode, unsigned long t)
+{
+	bool ret = time_after(inode->dirtied_when, t);
+#ifndef CONFIG_64BIT
+	/*
+	 * For inodes being constantly redirtied, dirtied_when can get stuck.
+	 * It _appears_ to be in the future, but is actually in distant past.
+	 * This test is necessary to prevent such wrapped-around relative times
+	 * from permanently stopping the whole pdflush writeback.
+	 */
+	ret = ret && time_before_eq(inode->dirtied_when, jiffies);
+#endif
+	return ret;
+}
+
 /*
  * Move expired dirty inodes from @delaying_queue to @dispatch_queue.
  */
@@ -231,7 +246,7 @@ static void move_expired_inodes(struct l
 		struct inode *inode = list_entry(delaying_queue->prev,
 						struct inode, i_list);
 		if (older_than_this &&
-			time_after(inode->dirtied_when, *older_than_this))
+				inode_dirtied_after(inode, *older_than_this))
 			break;
 		list_move(&inode->i_list, dispatch_queue);
 	}
@@ -492,8 +507,11 @@ void generic_sync_sb_inodes(struct super
 			continue;		/* blockdev has wrong queue */
 		}
 
-		/* Was this inode dirtied after sync_sb_inodes was called? */
-		if (time_after(inode->dirtied_when, start))
+		/*
+		 * Was this inode dirtied after sync_sb_inodes was called?
+		 * This keeps sync from extra jobs and livelock.
+		 */
+		if (inode_dirtied_after(inode, start))
 			break;
 
 		/* Is another pdflush already flushing this queue? */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ