lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 16 Sep 2010 23:11:47 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Mel Gorman <mel@....ul.ie>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org,
	Linux Kernel List <linux-kernel@...r.kernel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Wu Fengguang <fengguang.wu@...el.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Subject: Re: [PATCH 8/8] writeback: Do not sleep on the congestion queue if
 there are no congested BDIs or if significant congestion is not being
 encountered in the current zone

On Thu, Sep 16, 2010 at 10:18:24AM +0100, Mel Gorman wrote:
> On Thu, Sep 16, 2010 at 05:13:38PM +0900, Minchan Kim wrote:
> > On Wed, Sep 15, 2010 at 01:27:51PM +0100, Mel Gorman wrote:
> > > If wait_iff_congested() is called with no BDI congested, the function simply
> > > calls cond_resched(). In the event there is significant writeback happening
> > > in the zone that is being reclaimed, this can be a poor decision as reclaim
> > > would succeed once writeback was completed. Without any backoff logic,
> > > younger clean pages can be reclaimed resulting in more reclaim overall and
> > > poor performance.
> > 
> > I agree. 
> > 
> > > 
> > > This patch tracks how many pages backed by a congested BDI were found during
> > > scanning. If all the dirty pages encountered on a list isolated from the
> > > LRU belong to a congested BDI, the zone is marked congested until the zone
> > 
> > I am not sure it works well. 
> 
> Check the competion times for the micro-mapped-file-stream benchmark in
> the leader mail. Backing off like this is faster overall for some
> workloads.
> 
> > We just met the condition once but we backoff it until high watermark.
> 
> Reaching the high watermark is considered to be a relieving of pressure.
> 
> > (ex, 32 isolated dirty pages == 32 pages on congestioned bdi)
> > First impression is rather _aggressive_.
> > 
> 
> Yes, it is. I intended to start with something quite aggressive that is
> close to existing behaviour and then experiment with alternatives.

Agree. 

> 
> For example, I considered clearing zone congestion when but nr_bdi_congested
> drops to 0. This would be less aggressive in terms of congestion waiting but
> it is further from todays behaviour. I felt it would be best to introduce
> wait_iff_congested() in one kernel cycle but wait to a later cycle to deviate
> a lot from congestion_wait().

Fair enough. 

> 
> > How about more checking?
> > For example, if above pattern continues repeately above some threshold,
> > we can regard "zone is congested" and then if the pattern isn't repeated 
> > during some threshold, we can regard "zone isn't congested any more.".
> > 
> 
> I also considered these options and got stuck at what the "some
> threshold" is and how to record the history. Should it be recorded on a
> per BDI basis for example? I think all these questions can be answered
> but should be in a different cycle.
> 
> > > reaches the high watermark.  wait_iff_congested() then checks both the
> > > number of congested BDIs and if the current zone is one that has encounted
> > > congestion recently, it will sleep on the congestion queue. Otherwise it
> > > will call cond_reched() to yield the processor if necessary.
> > > 
> > > The end result is that waiting on the congestion queue is avoided when
> > > necessary but when significant congestion is being encountered,
> > > reclaimers and page allocators will back off.
> > > 
> > > Signed-off-by: Mel Gorman <mel@....ul.ie>
> > > ---
> > >  include/linux/backing-dev.h |    2 +-
> > >  include/linux/mmzone.h      |    8 ++++
> > >  mm/backing-dev.c            |   23 ++++++++----
> > >  mm/page_alloc.c             |    4 +-
> > >  mm/vmscan.c                 |   83 +++++++++++++++++++++++++++++++++++++------
> > >  5 files changed, 98 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> > > index 72bb510..f1b402a 100644
> > > --- a/include/linux/backing-dev.h
> > > +++ b/include/linux/backing-dev.h
> > > +static enum bdi_queue_status may_write_to_queue(struct backing_dev_info *bdi,
> > 
> > <snip>
> > 
> > >  			      struct scan_control *sc)
> > >  {
> > > +	enum bdi_queue_status ret = QUEUEWRITE_DENIED;
> > > +
> > >  	if (current->flags & PF_SWAPWRITE)
> > > -		return 1;
> > > +		return QUEUEWRITE_ALLOWED;
> > >  	if (!bdi_write_congested(bdi))
> > > -		return 1;
> > > +		return QUEUEWRITE_ALLOWED;
> > > +	else
> > > +		ret = QUEUEWRITE_CONGESTED;
> > >  	if (bdi == current->backing_dev_info)
> > > -		return 1;
> > > +		return QUEUEWRITE_ALLOWED;
> > >  
> > >  	/* lumpy reclaim for hugepage often need a lot of write */
> > >  	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> > > -		return 1;
> > > -	return 0;
> > > +		return QUEUEWRITE_ALLOWED;
> > > +	return ret;
> > >  }
> > 
> > The function can't return QUEUEXXX_DENIED.
> > It can affect disable_lumpy_reclaim. 
> > 
> 
> Yes, but that change was made in "vmscan: Narrow the scenarios lumpy
> reclaim uses synchrounous reclaim". Maybe I am misunderstanding your
> objection.

I means current may_write_to_queue never returns QUEUEWRITE_DENIED.
What's the role of it?

In addition, we don't need disable_lumpy_reclaim_mode() in pageout.
That's because both PAGE_KEEP and PAGE_KEEP_CONGESTED go to keep_locked
and calls disable_lumpy_reclaim_mode at last. 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ