[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100222173640.GG3063@balbir.in.ibm.com>
Date: Mon, 22 Feb 2010 23:06:40 +0530
From: Balbir Singh <balbir@...ux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@...hat.com>
Cc: Andrea Righi <arighi@...eler.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Suleiman Souhlal <suleiman@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] [PATCH 0/2] memcg: per cgroup dirty limit
* Vivek Goyal <vgoyal@...hat.com> [2010-02-22 09:27:45]:
> On Sun, Feb 21, 2010 at 04:18:43PM +0100, Andrea Righi wrote:
> > Control the maximum amount of dirty pages a cgroup can have at any given time.
> >
> > Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim)
> > page cache used by any cgroup. So, in case of multiple cgroup writers, they
> > will not be able to consume more than their designated share of dirty pages and
> > will be forced to perform write-out if they cross that limit.
> >
> > The overall design is the following:
> >
> > - account dirty pages per cgroup
> > - limit the number of dirty pages via memory.dirty_bytes in cgroupfs
> > - start to write-out in balance_dirty_pages() when the cgroup or global limit
> > is exceeded
> >
> > This feature is supposed to be strictly connected to any underlying IO
> > controller implementation, so we can stop increasing dirty pages in VM layer
> > and enforce a write-out before any cgroup will consume the global amount of
> > dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes limit.
> >
>
> Thanks Andrea. I had been thinking about looking into it from IO
> controller perspective so that we can control async IO (buffered writes
> also).
>
> Before I dive into patches, two quick things.
>
> - IIRC, last time you had implemented per memory cgroup "dirty_ratio" and
> not "dirty_bytes". Why this change? To begin with either per memcg
> configurable dirty ratio also makes sense? By default it can be the
> global dirty ratio for each cgroup.
>
> - Looks like we will start writeout from memory cgroup once we cross the
> dirty ratio, but still there is no gurantee that we be writting pages
> belonging to cgroup which crossed the dirty ratio and triggered the
> writeout.
>
> This behavior is not very good at least from IO controller perspective
> where if two dd threads are dirtying memory in two cgroups, then if
> one crosses it dirty ratio, it should perform writeouts of its own pages
> and not other cgroups pages. Otherwise we probably will again introduce
> serialization among two writers and will not see service differentation.
I thought that the I/O controller would eventually provide hooks to do
this.. no?
>
> May be we can modify writeback_inodes_wbc() to check first dirty page
> of the inode. And if it does not belong to same memcg as the task who
> is performing balance_dirty_pages(), then skip that inode.
Do you expect all pages of an inode to be paged in by the same cgroup?
--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists