linux-kernel - Re: [PATCH] writeback: permit through good bdi even when global dirty exceeded

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111202082950.GA19148@localhost>
Date:	Fri, 2 Dec 2011 16:29:50 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Matthew Wilcox <matthew@....cx>, Jan Kara <jack@...e.cz>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Theodore Ts'o <tytso@....edu>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] writeback: permit through good bdi even when global
 dirty exceeded

On Fri, Dec 02, 2011 at 03:03:59PM +0800, Andrew Morton wrote:
> On Fri, 2 Dec 2011 14:36:03 +0800 Wu Fengguang <fengguang.wu@...el.com> wrote:
> 
> > --- linux-next.orig/mm/page-writeback.c	2011-12-02 10:16:21.000000000 +0800
> > +++ linux-next/mm/page-writeback.c	2011-12-02 14:28:44.000000000 +0800
> > @@ -1182,6 +1182,14 @@ pause:
> >  		if (task_ratelimit)
> >  			break;
> >  
> > +		/*
> > +		 * In the case of an unresponding NFS server and the NFS dirty
> > +		 * pages exceeds dirty_thresh, give the other good bdi's a pipe
> > +		 * to go through, so that tasks on them still remain responsive.
> > +		 */
> > +		if (bdi_dirty < 8)
> > +			break;
> 
> What happens if the local disk has nine dirty pages?

The 9 dirty pages will be cleaned by the flusher (likely in one shot),
so after a while the dirtier task can dirty 8 pages more. This
consumer-producer work flow can keep going on as long as the magic
number chosen is >= 1.

> Also: please, no more magic numbers.  We have too many in there already.

Good point. Let's add some comment on the number chosen?

> What to do instead?  Perhaps arrange for devices which can block in
> this fashion to be identified as such in their backing_device and then
> prevent the kernel from ever permitting such devices to fully consume
> the dirty-page pool.

Yeah, that's considered too, unfortunately it's not as simple and
elegant than the proposed patch. For example, if giving all NFS mounts
the same "lowered" limit, there is still the problem that when one NFS
mount goes broken, the other NFS mounts are all impacted.

> If someone later comes along and decreases the dirty limits mid-flight,
> I guess the same problem occurs.  This can perhaps be handled by not
> permitting to limit to be set that low at that time.

Yes! Not long ago we introduced @global_dirty_limit and
update_dirty_limit() exactly for fixing that case. The comment says:

/*
 * The global dirtyable memory and dirty threshold could be suddenly knocked
 * down by a large amount (eg. on the startup of KVM in a swapless system).
 * This may throw the system into deep dirty exceeded state and throttle
 * heavy/light dirtiers alike. To retain good responsiveness, maintain
 * global_dirty_limit for tracking slowly down to the knocked down dirty
 * threshold.
 */
static void update_dirty_limit(unsigned long thresh, unsigned long dirty)
{       
...

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/