lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 24 May 2011 11:03:34 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Dave Chinner <david@...morbit.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@...radead.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 02/17] writeback: update dirtied_when for synced inode
 to prevent livelock

On Mon, May 23, 2011 at 09:14:24PM +0800, Jan Kara wrote:
> On Fri 20-05-11 05:31:19, Wu Fengguang wrote:
> > > > > @@ -419,6 +419,15 @@ writeback_single_inode(struct inode *ino
> > > > >  	spin_lock(&inode->i_lock);
> > > > >  	inode->i_state &= ~I_SYNC;
> > > > >  	if (!(inode->i_state & I_FREEING)) {
> > > > > +		/*
> > > > > +		 * Sync livelock prevention. Each inode is tagged and synced in
> > > > > +		 * one shot, so we can unconditionally update its dirty time to
> > > > > +		 * prevent syncing it again. Note that time ordering of b_dirty
> > > > > +		 * list will be kept because the following code either removes
> > > > > +		 * the inode from b_dirty or calls redirty_tail().
> > > > > +		 */
> > > > > +		if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_sync)
> > > > > +			inode->dirtied_when = jiffies;
> > > > 
> > > > Shouldn't this  update only ocur if the inode is still dirty?
> > > 
> > > Yeah, that would be better even though the current form won't lead to
> > > errors.
> > > 
> > > Let's add one more test (inode->i_state & I_DIRTY)?
> > > (I was actually aware of the trade offs and didn't bother to add it..)
> > 
> > Oops, the (inode->i_state & I_DIRTY) test is not enough because
> > when the inode is actively dirtied, I_DIRTY_PAGES won't be set when
> > the flusher is writing out the pages with I_SYNC set...
>   Well, rather on contrary I_DIRTY_PAGES won't be set when noone dirtied
> any page after we cleared I_DIRTY_PAGES.

Ah sorry! I misread the logic in __mark_inode_dirty(): when I_SYNC is
set, i_state will still be updated when some new pages are dirtied.

> > Well it would look clumsy to add another mapping_tagged(mapping,
> > PAGECACHE_TAG_DIRTY) test. So I tend to revert to the original scheme
> > of updating ->dirtied_when only on newly dirtied pages. It's the
> > simplest form that can avoid unnecessarily polluting ->dirtied_when.
>   Hmm, but won't now something like:
> while true; do touch f; done
>   livelock sync? We always manage to write something - 1 inode - and the
> inode will never be clean (before the IO completes, the other process
> manages to dirty the inode again with very high probability).

You are right. Here is the updated patch that can cover that livelock case.

Thanks,
Fengguang
---
Subject: writeback: update dirtied_when for synced inode to prevent livelock
Date: Wed Apr 27 19:05:21 CST 2011

Explicitly update .dirtied_when on synced inodes, so that they are no
longer considered for writeback in the next round.

We'll do more aggressive "keep writeback as long as we wrote something"
logic in wb_writeback(). The "use LONG_MAX .nr_to_write" trick in commit
b9543dac5bbc ("writeback: avoid livelocking WB_SYNC_ALL writeback") will
no longer be enough to stop sync livelock.

It can prevent both of the following livelock schemes:

- while true; do echo data >> f; done
- while true; do touch f;        done

Reviewed-by: Jan Kara <jack@...e.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
---
 fs/fs-writeback.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2011-05-20 05:35:41.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-05-24 10:54:01.000000000 +0800
@@ -419,6 +419,15 @@ writeback_single_inode(struct inode *ino
 	spin_lock(&inode->i_lock);
 	inode->i_state &= ~I_SYNC;
 	if (!(inode->i_state & I_FREEING)) {
+		/*
+		 * Sync livelock prevention. Each inode is tagged and synced in
+		 * one shot. If still dirty, it will be redirty_tail()'ed below.
+		 * Update the dirty time to prevent enqueue and sync it again.
+		 */
+		if ((inode->i_state & I_DIRTY) &&
+		    (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages))
+			inode->dirtied_when = jiffies;
+
 		if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
 			/*
 			 * We didn't write back all the pages.  nfs_writepages()
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ