[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1338389946-13711-1-git-send-email-jeff.liu@oracle.com>
Date: Wed, 30 May 2012 22:58:54 +0800
From: jeff.liu@...cle.com
To: containers@...ts.linux-foundation.org
Cc: cgroups@...r.kernel.org, jack@...e.cz, glommer@...allels.com,
daniel.lezcano@...e.fr, tytso@....edu, bpm@....com,
chris.mason@...cle.com, hch@...radead.org,
christopher.jones@...cle.com, david@...morbit.com,
tinguely@....com, tm@....ma, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: container disk quota
Hello All,
According to glauber's comments regarding container disk quota, it should be binded to mount
namespace rather than cgroup.
Per my try out, it works just fine by combining with userland quota utilitly in this way.
However, they are something has to be done at user tools too IMHO.
Currently, the patchset is in very initial phase, I'd like to post it early to seek more
feedbacks from you guys.
Hopefully I can clarify my ideas clearly.
Kernel part:
* Container quota can be enabled indenpent to VFS quota or particular file system quota.
quota per user/group are kept at memory instead of saved at separately files like general quota.
There is no need to remount the rootfs inside container with general quota strings, quota could be
enabled through quotaon/off directly.
* Always honor underlying file system quota checking firstly. i.e, the exported quota bill up
routines are take affected only after file system quota check up done if it is enabled at the
same time. hence the space allocation or inode creation inside container will failed if the
outside quota limits were exceeded.
* Make use of the general VFS Q_XXXX quota control flags.
* Introduce a new disk quota struture as well as the operations to mount namespacedata structure,
it should only be allocated and initialized at CLONE stage for contianer.
* Modify quotactl(2) to examine if the caller is invoked inside container.
implemented by checking the quota device name("rootfs" for lxc guest) or current pid namespace
is not the initial one, then do mount namespace quotactl if required, or goto
the normal quotactl procedure.
* Introduce a new quota format "QFMT_NS" for container. It will be used to examine the quota
format at userland tools, so that quotacheck will do container quota IO initialization and
proceeding operations. This flag returned when Q_GETQINFO was issued.
* Export a couple of container quota bill routines to the desired underlying
file system. They will take affected if container quota is enabled at kernel
configuration, or just some inline functions without much overhead.
* Also, I have not handle a couple of things for now.
. I think the container quota should be isolated to Jan's fs/quota/ directory.
. There are a dozens of helper routines at general quota, e.g,
struct if_dqblk <-> struct fs_disk_quota converts.
dquot space and inodes bill up.
They can be refactored as shared routines to some extents.
. quotastats(8) is not teached to aware container for now.
Changes in quota userland utility:
* Introduce a new quota format string "lxc" to all quota control utility, to
let each utility know that the user want to run container quota control. e.g:
quotacheck -cvugm -F "lxc" /
quotaon -u -F "lxc" /
....
* Currently, I manually created the underlying device(by editing cgroup
device access list and running mknod /dev/sdaX x x) for the rootfs
inside containers to let the cache mount points routine pass for
executing quotacheck against the "/" directory. Actually, it can be
omitted here.
* Add a new quotaio_lxc.c[.h] for container quota IO, it basically same to
VFS quotaio logic, I just hope to isolate container stuff here.
Issues:
* How to detect quotactl(2) is launched from container in a reasonable way.
* Do we need to let container quota works for cgroup combine with unshare(1)?
Now the patchset is mainly works for lxc guest. IMHO, it can be used outside
guest if the user desired. In this case, the quota limits can take effort
among different underlying file systems if they have exported quota billing
routines.
* As the configure entry for print warnning info to TTY has been marked to
obsoleted, do we still need to support that.
* The warnning info format for sending it through netlink interface.
VFS quota has a device parameter filled in the warns, how we define the
format for container?
* The hash table list defines(hash table size)for dquot caching for each type is
referred to kernel/user.c, maybe its better to define an array separatly for
performance optimizations. Of course, that's all depending on my current
implementation is on the right road. :)
* Container quota statistics, should them be calculated and exposed to /proc/fs/quota? If the underlying file system also enabled with quotas, they will be
mixed up, so how about add a new proc file like "ns_quota" there?
* Memory shrinks acquired from kswap.
As all dquot are cached in memory, and if the user executing quotaoff, maybe
I need to handle quota disable but still be kept at memory.
Also, add another routine to disable and remove all quotas from memory to
save memory directly.
* Project quota(i.e, tree quota) support.
Now the quota implemented without project quota supports, but it can be
supported not complex based on current code, add a new parameter to
ns_dquot_alloc_block(), etc... is ok.
However, XFS support project quota setup on xfs tools, I observed there
already have patchset for this feature in EXT4 mailist, is it possble
to supply a unique interface and implementation to quota tools in the
furture?
AFAICS, project quota can be setup in container, because of we can
fetch the super block from the transferred path. Hence, the desired
ioctl(2) for underlying file system can be invoked.
* Security check up for mount namespace quotactl(2).
In this version, I only do basic security check up to see if the caller
has properly permissions for doing that. I think I must miss much things
in this point.
Testing:
Currently patch is lacking tests, I only do a few check to make sure the
basic operations works.
First of all, we need to invoke quotacheck with "--no-remount" opition
since the rootfs inside container guest can not be remouted:
root@...ian:~/# quotacheck -cvugm -F "lxc" /
quotacheck: quotacheck: Scanning rootfs [/] done
quotacheck: Old user file name could not been determined. Usage will not be subtracted.
quotacheck: Old group file name could not been determined. Usage will not be subtracted.
quotacheck: Old user file name could not been determined. Usage will not be subtracted.
quotacheck: Old group file name could not been determined. Usage will not be subtracted.
quotacheck: Checked 3370 directories and 39434 files
By default, user/group quota is off:
root@...ian:~/# quotaon -u -F "lxc" -p /
user quota on / (rootfs) is off
root@...ian:~/# quotaon -u -F "lxc" -p /
group quota on / (rootfs) is off
Turn them on:
root@...ian:~/# quotaon -u -F "lxc" /
root@...ian:~/# quotaon -g -F "lxc" /
root@...ian:~/# quotaon -u -F "lxc" -p /
user quota on / (rootfs) is on
root@...ian:~/# quotaon -g -F "lxc" -p /
group quota on / (rootfs) is on
Edit quota, soft/hard for both space and inode are zeros by default:
configure them to a desired value:
root@...ian:~/# edquota -u -F "lxc" /
Disk quotas for user jeff (uid 1000):
Filesystem blocks soft hard inodes soft
hard
rootfs 2025740 2025840 2026000 42786 42790 42800
The configuration are saved properly:
root@...ian:~/# repquota -u -F "lxc" /
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff -- 2025740 2025840 2026000 42786 42790 42800
Do checking for blocks and inodes limits:
root@...ian:~/# su - jeff
jeff@...ian:/$ dd if=/dev/zero of=abc bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 1.19014 s, 8.8 MB/s
root@...ian:~/# repquota -u -F "lxc" /
Jeff *** report() type=0 handle index=0
*** Report for user quotas on device rootfs
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff +- 2025980 2025840 2026000 7days 42786 42790 42800
root@...ian:~/# repquota -g -F "lxc" /
*** Report for group quotas on device rootfs
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
Group used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 8564 0 0 390 0 0
adm -- 220 0 0 6 0 0
tty -- 0 0 0 1 0 0
utmp -- 4 0 0 1 0 0
jeff -- 2021268 0 0 42716 0 0
root@...ian:~/# su - jeff
jeff@...ian:/$ dd if=/dev/zero of=test_space bs=1M count=100
dd: writing `test_space': Disk quota exceeded
11+0 records in
10+0 records out
10506240 bytes (11 MB) copied, 1.24721 s, 8.4 MB/s
root@...ian:~/# repquota -u -F "lxc" /
Jeff *** report() type=0 handle index=0
*** Report for user quotas on device rootfs
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff +- 2026000 2025840 2026000 7days 42786 42790 42800
root@...ian:~/# su - jeff
jeff@...ian:/$ for ((i=0; i<20; i++)); do touch test_file_cnt.$i; done
touch: cannot touch `test_file_cnt.14': Disk quota exceeded
touch: cannot touch `test_file_cnt.16': Disk quota exceeded
touch: cannot touch `test_file_cnt.18': Disk quota exceeded
root@...ian:~/# repquota -u -F "lxc" /
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff ++ 2026000 2025840 2026000 6days 42800 42790 42800 7days
Any comments are appreciated, have a nice day!
-Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists