[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150106214426.GA24106@htj.dyndns.org>
Date: Tue, 6 Jan 2015 16:44:26 -0500
From: Tejun Heo <tj@...nel.org>
To: axboe@...nel.dk
Cc: linux-kernel@...r.kernel.org, jack@...e.cz, hch@...radead.org,
hannes@...xchg.org, linux-fsdevel@...r.kernel.org,
vgoyal@...hat.com, lizefan@...wei.com, cgroups@...r.kernel.org,
linux-mm@...ck.org, mhocko@...e.cz, clm@...com,
fengguang.wu@...el.com, david@...morbit.com
Subject: Re: [PATCHSET RFC block/for-next] writeback: cgroup writeback support
Hello, again. A bit of addition.
On Tue, Jan 06, 2015 at 04:25:37PM -0500, Tejun Heo wrote:
...
> Overall design
> --------------
What's going on in this patchset is fairly straight forward. The main
thing which is happening is that a bdi is being split into multiple
per-cgroup pieces. Each split bdi, represented by bdi_writeback,
behaves mostly identically with how bdi behaved before.
Complications mostly arise from filesystems and inodes having to deal
with multiple split bdi's instead of one, but those are mostly
straight-forward 1:N mapping issues. It does get tedious here and
there but doesn't complicate the overall picture. This
straight-forwardness pays off when dealing with interaction issues
which would have been extremely hairy otherwise. More on this while
discussing balance_dirty_pages.
...
> Missing pieces
> --------------
...
> * balance_dirty_pages currently doesn't consider the task's memcg when
> calculating the number of dirtyable pages. This means that tasks in
> memcg won't have the benefit of smooth background writeback and will
> bump into direct reclaim all the time. This has always been like
> this but with cgroup writeback support, this is also finally
> fixable. I'll work on this as the earlier part gets settled.
This has always been a really thorny issue but now that each wb
behaves as an independent writeback domain, this can be solved nicely.
Each cgroup can carry the fraction of writebandwidth against the whole
system and each task can carry its fraction against its memcg.
balance_dirty_pages can now stagger these two ratios and then apply it
against the memory which *may* be dirtyable to the task's memcg and
then throttle the dirtier accordingly. This works out exactly as a
straight-forward extension of the global logic which is proven to
work. This really is pieces falling into places.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists