[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6599ad830902230009mbfe7ddkf40f183c4a61a81a@mail.gmail.com>
Date: Mon, 23 Feb 2009 00:09:54 -0800
From: Paul Menage <menage@...gle.com>
To: anqin <anqin.qin@...il.com>
Cc: Daniel Lezcano <dlezcano@...ibm.com>,
"Serge E. Hallyn" <serue@...ibm.com>,
Rolando Martins <rolando.martins@...il.com>,
linux-kernel@...r.kernel.org, containers@...ts.osdl.org
Subject: Re: [RFC] [PATCH] cgroup: accounting and limitation of disk quota
Hi An,
On Sun, Feb 22, 2009 at 4:37 AM, anqin <anqin.qin@...il.com> wrote:
> The patch presents a cgroup subsystem to control the usage of disk quota.
Thanks for sending this patch.
My overall feeling is that disk quotas aren't really something that
you want to control at a cgroup level (i.e. associating a limit with a
specific set of processes), they're something that you want to control
at the directory hierarchy level (i.e. associate a limit with this
directory and all its children).
In the case of a virtual server these may well be the same thing - a
process in the virtual server can't touch any files outside the
virtual server's filespace, and stuff outside the virtual server will
be well-behaved and won't touch files inside the virtual server's
filespace.
But for systems that are doing resource isolation without
virtualization, this isn't necessarily still the case. A process may
have access to multiple areas of the disk with independent quotas.
E.g. I work on a job control system where each job has some private
disk space, and may share a common pool of disk space with some
related jobs on the same machine, for data that's shared between
multiple jobs.
In this case, there are separate disk quotas for the per-job private
areas and the shared area, so this cgroup-based approach wouldn't be
much use there. Something like Neil Brown's "tree quota" proposal from
way back in 2001 seemed much more useful for this kind of isolation.
The proposal was that you could associate a "tree id" with an inode,
and then that inode and all its children were accounted against the
quota of that tree id. The arguments against it were (AFAIR) mostly
about the non-determinism issues that could arise if a single inode
were hard-linked into multiple trees - essentially, the first time it
was accessed from either tree it would become part of that tree, even
though it was reachable (and modifiable) from the other tree. But as
long as root doesn't do anything silly, this isn't really an issue,
and similar issues arise with this cgroup-based approach - if a
process outside a virtual server moves a file into that virtual
server's filespace without updating the usage correctly (which AFAICS
can't be done atomically?) then the quota stats will be off.
More specific comments on this patch:
- it would make more sense to integrate with the existing DQUOT_XXX
macros rather than have to update every filesystem to include
references to cgroup quotas as well as regular quotas.
- disk_cgroup_read_stats() should be a read_map() handler, and
disk_cgroup_read_quota() should be a read_u64() handler.
- why do you have the checks and EPERM returns in disk_cgroup_create()
? cgroupfs already does permission checking.
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists