linux-kernel - Re: [PATCH 3/5] writeback: stop background/kupdate works from livelocking other works

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 11 Nov 2010 08:40:47 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Christoph Hellwig <hch@....de>,
	Jan Engelhardt <jengelh@...ozas.de>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/5] writeback: stop background/kupdate works from
 livelocking other works

On Thu, Nov 11, 2010 at 07:37:29AM +0800, Andrew Morton wrote:
> On Wed, 10 Nov 2010 00:56:32 +0100
> Jan Kara <jack@...e.cz> wrote:
> 
> > On Tue 09-11-10 15:00:06, Andrew Morton wrote:
> > > On Tue, 9 Nov 2010 23:28:27 +0100
> > > Jan Kara <jack@...e.cz> wrote:
> > > >   New description which should address above questions:
> > > > Background writeback is easily livelockable in a loop in wb_writeback() by
> > > > a process continuously re-dirtying pages (or continuously appending to a
> > > > file). This is in fact intended as the target of background writeback is to
> > > > write dirty pages it can find as long as we are over
> > > > dirty_background_threshold.
> > > 
> > > Well.  The objective of the kupdate function is utterly different.
> > > 
> > > > But the above behavior gets inconvenient at times because no other work
> > > > queued in the flusher thread's queue gets processed. In particular,
> > > > since e.g. sync(1) relies on flusher thread to do all the IO for it,
> > > 
> > > That's fixable by doing the work synchronously within sync_inodes_sb(),
> > > rather than twiddling thumbs wasting a thread resource while waiting
> > > for kernel threads to do it.  As an added bonus, this even makes cpu
> > > time accounting more accurate ;)
> > > 
> > > Please remind me why we decided to hand the sync_inodes_sb() work off
> > > to other threads?
> >   Because when sync(1) does IO on it's own, it competes for the device with
> > the flusher thread running in parallel thus resulting in more seeks.
> 
> Skeptical.  Has that effect been demonstrated?  Has it been shown to be
> a significant problem?  A worse problem than livelocking the machine? ;)
> 
> If this _is_ a problem then it's also a problem for fsync/msync.  But
> see below.

Seriously, I also doubt the value of doing sync() in the flusher thread.
sync() is by definition inefficient. In the block layer, it's served
with less emphasis on throughput. In the VFS layer, it may sleep in
inode_wait_for_writeback() and filemap_fdatawait(). In various FS,
pages won't be skipped at the cost of more lock waiting.

So when a flusher thread is serving sync(), it has difficulties
saturating the storage device.

btw, it seems current sync() does not take advantage of the flusher
threads to sync multiple disks in parallel.

And I guess (concurrent) sync/fsync/msync calls will be rare,
especially for really performance demanding workloads (which will
optimize sync away in the first place).

And I'm still worrying about the sync work (which may take long time
to serve even without livelock) to delay other works considerably --
may not be a problem for now, but it will be a real priority dilemma
when we start writeback works from pageout().

> OT, but: your faith in those time-ordered inode lists is touching ;)
> Put a debug function in there which checks that the lists _are_
> time-ordered, and call that function from every site in the kernel
> which modifies the lists.   I bet there are still gremlins.

I'm more confident on that time orderness ;) But there is a caveat:
redirty_tail() may touch dirtied_when. So it merely keeps the time
orderness of b_dirty on the surface.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/