linux-kernel - Re: [RFC] [PATCH 0/2] memcg: per cgroup dirty limit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100222173640.GG3063@balbir.in.ibm.com>
Date:	Mon, 22 Feb 2010 23:06:40 +0530
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Andrea Righi <arighi@...eler.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] [PATCH 0/2] memcg: per cgroup dirty limit

* Vivek Goyal <vgoyal@...hat.com> [2010-02-22 09:27:45]:

> On Sun, Feb 21, 2010 at 04:18:43PM +0100, Andrea Righi wrote:
> > Control the maximum amount of dirty pages a cgroup can have at any given time.
> > 
> > Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim)
> > page cache used by any cgroup. So, in case of multiple cgroup writers, they
> > will not be able to consume more than their designated share of dirty pages and
> > will be forced to perform write-out if they cross that limit.
> > 
> > The overall design is the following:
> > 
> >  - account dirty pages per cgroup
> >  - limit the number of dirty pages via memory.dirty_bytes in cgroupfs
> >  - start to write-out in balance_dirty_pages() when the cgroup or global limit
> >    is exceeded
> > 
> > This feature is supposed to be strictly connected to any underlying IO
> > controller implementation, so we can stop increasing dirty pages in VM layer
> > and enforce a write-out before any cgroup will consume the global amount of
> > dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes limit.
> > 
> 
> Thanks Andrea. I had been thinking about looking into it from IO
> controller perspective so that we can control async IO (buffered writes
> also).
> 
> Before I dive into patches, two quick things.
> 
> - IIRC, last time you had implemented per memory cgroup "dirty_ratio" and
>   not "dirty_bytes". Why this change? To begin with either per memcg
>   configurable dirty ratio also makes sense? By default it can be the
>   global dirty ratio for each cgroup.
> 
> - Looks like we will start writeout from memory cgroup once we cross the
>   dirty ratio, but still there is no gurantee that we be writting pages
>   belonging to cgroup which crossed the dirty ratio and triggered the
>   writeout.
> 
>   This behavior is not very good at least from IO controller perspective
>   where if two dd threads are dirtying memory in two cgroups, then if
>   one crosses it dirty ratio, it should perform writeouts of its own pages
>   and not other cgroups pages. Otherwise we probably will again introduce
>   serialization among two writers and will not see service differentation.

I thought that the I/O controller would eventually provide hooks to do
this.. no?

> 
>   May be we can modify writeback_inodes_wbc() to check first dirty page
>   of the inode. And if it does not belong to same memcg as the task who
>   is performing balance_dirty_pages(), then skip that inode.

Do you expect all pages of an inode to be paged in by the same cgroup?


-- 
	Three Cheers,
	Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/