linux-kernel - Re: [PATCH 03/10] writeback: Do not congestion sleep if there are no congested BDIs or significant writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100907152533.GB4620@barrios-desktop>
Date:	Wed, 8 Sep 2010 00:25:33 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Mel Gorman <mel@....ul.ie>
Cc:	linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
	Linux Kernel List <linux-kernel@...r.kernel.org>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Dave Chinner <david@...morbit.com>,
	Chris Mason <chris.mason@...cle.com>,
	Christoph Hellwig <hch@....de>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 03/10] writeback: Do not congestion sleep if there are
 no congested BDIs or significant writeback

On Mon, Sep 06, 2010 at 11:47:26AM +0100, Mel Gorman wrote:
> If congestion_wait() is called with no BDIs congested, the caller will sleep
> for the full timeout and this may be an unnecessary sleep. This patch adds
> a wait_iff_congested() that checks congestion and only sleeps if a BDI is
> congested or if there is a significant amount of writeback going on in an
> interesting zone. Else, it calls cond_resched() to ensure the caller is
> not hogging the CPU longer than its quota but otherwise will not sleep.
> 
> This is aimed at reducing some of the major desktop stalls reported during
> IO. For example, while kswapd is operating, it calls congestion_wait()
> but it could just have been reclaiming clean page cache pages with no
> congestion. Without this patch, it would sleep for a full timeout but after
> this patch, it'll just call schedule() if it has been on the CPU too long.
> Similar logic applies to direct reclaimers that are not making enough
> progress.
> 
> Signed-off-by: Mel Gorman <mel@....ul.ie>
> ---
>  include/linux/backing-dev.h      |    2 +-
>  include/trace/events/writeback.h |    7 ++++
>  mm/backing-dev.c                 |   66 ++++++++++++++++++++++++++++++++++++-
>  mm/page_alloc.c                  |    4 +-
>  mm/vmscan.c                      |   26 ++++++++++++--
>  5 files changed, 96 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> index 35b0074..f1b402a 100644
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -285,7 +285,7 @@ enum {
>  void clear_bdi_congested(struct backing_dev_info *bdi, int sync);
>  void set_bdi_congested(struct backing_dev_info *bdi, int sync);
>  long congestion_wait(int sync, long timeout);
> -
> +long wait_iff_congested(struct zone *zone, int sync, long timeout);
>  
>  static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi)
>  {
> diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> index 275d477..eeaf1f5 100644
> --- a/include/trace/events/writeback.h
> +++ b/include/trace/events/writeback.h
> @@ -181,6 +181,13 @@ DEFINE_EVENT(writeback_congest_waited_template, writeback_congestion_wait,
>  	TP_ARGS(usec_timeout, usec_delayed)
>  );
>  
> +DEFINE_EVENT(writeback_congest_waited_template, writeback_wait_iff_congested,
> +
> +	TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed),
> +
> +	TP_ARGS(usec_timeout, usec_delayed)
> +);
> +
>  #endif /* _TRACE_WRITEBACK_H */
>  
>  /* This part must be outside protection */
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 298975a..94b5433 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -724,6 +724,7 @@ static wait_queue_head_t congestion_wqh[2] = {
>  		__WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[0]),
>  		__WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[1])
>  	};
> +static atomic_t nr_bdi_congested[2];
>  
>  void clear_bdi_congested(struct backing_dev_info *bdi, int sync)
>  {
> @@ -731,7 +732,8 @@ void clear_bdi_congested(struct backing_dev_info *bdi, int sync)
>  	wait_queue_head_t *wqh = &congestion_wqh[sync];
>  
>  	bit = sync ? BDI_sync_congested : BDI_async_congested;
> -	clear_bit(bit, &bdi->state);
> +	if (test_and_clear_bit(bit, &bdi->state))
> +		atomic_dec(&nr_bdi_congested[sync]);
>  	smp_mb__after_clear_bit();
>  	if (waitqueue_active(wqh))
>  		wake_up(wqh);
> @@ -743,7 +745,8 @@ void set_bdi_congested(struct backing_dev_info *bdi, int sync)
>  	enum bdi_state bit;
>  
>  	bit = sync ? BDI_sync_congested : BDI_async_congested;
> -	set_bit(bit, &bdi->state);
> +	if (!test_and_set_bit(bit, &bdi->state))
> +		atomic_inc(&nr_bdi_congested[sync]);
>  }
>  EXPORT_SYMBOL(set_bdi_congested);
>  
> @@ -774,3 +777,62 @@ long congestion_wait(int sync, long timeout)
>  }
>  EXPORT_SYMBOL(congestion_wait);
>  
> +/**
> + * congestion_wait - wait for a backing_dev to become uncongested
      wait_iff_congested

> + * @zone: A zone to consider the number of being being written back from
> + * @sync: SYNC or ASYNC IO
> + * @timeout: timeout in jiffies
> + *
> + * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
> + * write congestion.  If no backing_devs are congested then the number of
> + * writeback pages in the zone are checked and compared to the inactive
> + * list. If there is no sigificant writeback or congestion, there is no point
                                                and 

> + * in sleeping but cond_resched() is called in case the current process has
> + * consumed its CPU quota.
> + */
> +long wait_iff_congested(struct zone *zone, int sync, long timeout)
> +{
> +	long ret;
> +	unsigned long start = jiffies;
> +	DEFINE_WAIT(wait);
> +	wait_queue_head_t *wqh = &congestion_wqh[sync];
> +
> +	/*
> +	 * If there is no congestion, check the amount of writeback. If there
> +	 * is no significant writeback and no congestion, just cond_resched
> +	 */
> +	if (atomic_read(&nr_bdi_congested[sync]) == 0) {
> +		unsigned long inactive, writeback;
> +
> +		inactive = zone_page_state(zone, NR_INACTIVE_FILE) +
> +				zone_page_state(zone, NR_INACTIVE_ANON);
> +		writeback = zone_page_state(zone, NR_WRITEBACK);
> +
> +		/*
> +		 * If less than half the inactive list is being written back,
> +		 * reclaim might as well continue
> +		 */
> +		if (writeback < inactive / 2) {

I am not sure this is best.

1. Without considering various speed class storage, could we fix it as half of inactive?
2. Isn't there any writeback throttling on above layer? Do we care of it in here?

Just out of curiosity. 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/