[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110315225409.GD5740@redhat.com>
Date: Tue, 15 Mar 2011 18:54:09 -0400
From: Vivek Goyal <vgoyal@...hat.com>
To: Greg Thelen <gthelen@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
containers@...ts.osdl.org, linux-fsdevel@...r.kernel.org,
Andrea Righi <arighi@...eler.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
Minchan Kim <minchan.kim@...il.com>,
Johannes Weiner <hannes@...xchg.org>,
Ciju Rajan K <ciju@...ux.vnet.ibm.com>,
David Rientjes <rientjes@...gle.com>,
Wu Fengguang <fengguang.wu@...el.com>,
Chad Talbott <ctalbott@...gle.com>,
Justin TerAvest <teravest@...gle.com>
Subject: Re: [PATCH v6 9/9] memcg: make background writeback memcg aware
On Fri, Mar 11, 2011 at 10:43:31AM -0800, Greg Thelen wrote:
> Add an memcg parameter to bdi_start_background_writeback(). If a memcg
> is specified then the resulting background writeback call to
> wb_writeback() will run until the memcg dirty memory usage drops below
> the memcg background limit. This is used when balancing memcg dirty
> memory with mem_cgroup_balance_dirty_pages().
>
> If the memcg parameter is not specified, then background writeback runs
> globally system dirty memory usage falls below the system background
> limit.
>
> Signed-off-by: Greg Thelen <gthelen@...gle.com>
> ---
[..]
> -static inline bool over_bground_thresh(void)
> +static inline bool over_bground_thresh(struct mem_cgroup *mem_cgroup)
> {
> unsigned long background_thresh, dirty_thresh;
>
> + if (mem_cgroup) {
> + struct dirty_info info;
> +
> + if (!mem_cgroup_hierarchical_dirty_info(
> + determine_dirtyable_memory(), false,
> + mem_cgroup, &info))
> + return false;
> +
> + return info.nr_file_dirty +
> + info.nr_unstable_nfs > info.background_thresh;
> + }
> +
> global_dirty_limits(&background_thresh, &dirty_thresh);
>
> return (global_page_state(NR_FILE_DIRTY) +
> @@ -683,7 +694,8 @@ static long wb_writeback(struct bdi_writeback *wb,
> * For background writeout, stop when we are below the
> * background dirty threshold
> */
> - if (work->for_background && !over_bground_thresh())
> + if (work->for_background &&
> + !over_bground_thresh(work->mem_cgroup))
> break;
>
> wbc.more_io = 0;
> @@ -761,23 +773,6 @@ static unsigned long get_nr_dirty_pages(void)
> get_nr_dirty_inodes();
> }
>
> -static long wb_check_background_flush(struct bdi_writeback *wb)
> -{
> - if (over_bground_thresh()) {
> -
> - struct wb_writeback_work work = {
> - .nr_pages = LONG_MAX,
> - .sync_mode = WB_SYNC_NONE,
> - .for_background = 1,
> - .range_cyclic = 1,
> - };
> -
> - return wb_writeback(wb, &work);
> - }
> -
> - return 0;
> -}
> -
> static long wb_check_old_data_flush(struct bdi_writeback *wb)
> {
> unsigned long expired;
> @@ -839,15 +834,17 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
> */
> if (work->done)
> complete(work->done);
> - else
> + else {
> + if (work->mem_cgroup)
> + mem_cgroup_bg_writeback_done(work->mem_cgroup);
> kfree(work);
> + }
> }
>
> /*
> * Check for periodic writeback, kupdated() style
> */
> wrote += wb_check_old_data_flush(wb);
> - wrote += wb_check_background_flush(wb);
Hi Greg,
So in the past we will leave the background work unfinished and try
to finish queued work first.
I see following line in wb_writeback().
/*
* Background writeout and kupdate-style writeback may
* run forever. Stop them if there is other work to do
* so that e.g. sync can proceed. They'll be restarted
* after the other works are all done.
*/
if ((work->for_background || work->for_kupdate) &&
!list_empty(&wb->bdi->work_list))
break;
Now you seem to have converted background writeout also as queued
work item. So it sounds wb_writebac() will finish that background
work early and never take it up and finish other queued items. So
we might finish queued items still flusher thread might exit
without bringing down the background ratio of either root or memcg
depending on the ->mem_cgroup pointer.
May be requeuing the background work at the end of list might help.
Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists