lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 31 May 2012 12:54:25 +0400
From:	Glauber Costa <glommer@...allels.com>
To:	<jeff.liu@...cle.com>
CC:	<containers@...ts.linux-foundation.org>, <cgroups@...r.kernel.org>,
	<jack@...e.cz>, <daniel.lezcano@...e.fr>, <tytso@....edu>,
	<bpm@....com>, <chris.mason@...cle.com>, <hch@...radead.org>,
	<christopher.jones@...cle.com>, <david@...morbit.com>,
	<tinguely@....com>, <tm@....ma>, <linux-ext4@...r.kernel.org>,
	<linux-fsdevel@...r.kernel.org>
Subject: Re: container disk quota

On 05/30/2012 06:58 PM, jeff.liu@...cle.com wrote:
> Hello All,
>
> According to glauber's comments regarding container disk quota, it should be binded to mount
> namespace rather than cgroup.
>
> Per my try out, it works just fine by combining with userland quota utilitly in this way.
that's great.

I'll take a look at the patches.


>
> * Modify quotactl(2) to examine if the caller is invoked inside container.
>    implemented by checking the quota device name("rootfs" for lxc guest) or current pid namespace
>    is not the initial one, then do mount namespace quotactl if required, or goto
>    the normal quotactl procedure.

I dislike the use of "lxc" name. There is nothing lxc-specific in this, 
this is namespace-specific. lxc is just one of the container solutions 
out there, so let's keep it generic.

>
> * Also, I have not handle a couple of things for now.
>    . I think the container quota should be isolated to Jan's fs/quota/ directory.
>    . There are a dozens of helper routines at general quota, e.g,
>      struct if_dqblk<->  struct fs_disk_quota converts.
>      dquot space and inodes bill up.
>      They can be refactored as shared routines to some extents.
>    . quotastats(8) is not teached to aware container for now.
>
> Changes in quota userland utility:
> * Introduce a new quota format string "lxc" to all quota control utility, to
>    let each utility know that the user want to run container quota control. e.g:
>    quotacheck -cvugm -F "lxc" /
>    quotaon -u -F "lxc" /
>    ....
>
> * Currently, I manually created the underlying device(by editing cgroup
>    device access list and running mknod /dev/sdaX x x) for the rootfs
>    inside containers to let the cache mount points routine pass for
>    executing quotacheck against the "/" directory.  Actually, it can be
>    omitted here.
>
> * Add a new quotaio_lxc.c[.h] for container quota IO, it basically same to
>    VFS quotaio logic, I just hope to isolate container stuff here.
>
> Issues:
> * How to detect quotactl(2) is launched from container in a reasonable way.

It's a system call. It is always called by a process. The process 
belongs to a namespace. What else is needed?

> * Do we need to let container quota works for cgroup combine with unshare(1)?
>    Now the patchset is mainly works for lxc guest.  IMHO, it can be used outside
>    guest if the user desired.  In this case, the quota limits can take effort
>    among different underlying file systems if they have exported quota billing
>    routines.

I still don't understand what is the business of cgroups here. If you 
are attaching it to mount namespace, you can always infer the context 
from the calling process. I still need to look at your patches, but I 
believe that dropping the "feature" of manipulating this from outside of 
the container will save you a lot of trouble.

Please note that a process can temporarily join a namespace with 
setns(). So you can have a *utility* that does it from the outer world, 
but the kernel has no business with that. As far as we're concerned, I 
believe that you should always get your context from the current 
namespace, and forbid any usage from outside.

> * The hash table list defines(hash table size)for dquot caching for each type is
>    referred to kernel/user.c, maybe its better to define an array separatly for
>    performance optimizations.  Of course, that's all depending on my current
>    implementation is on the right road. :)
>
> * Container quota statistics, should them be calculated and exposed to /proc/fs/quota?  If the underlying file system also enabled with quotas, they will be
>    mixed up, so how about add a new proc file like "ns_quota" there?
No, this should be transferred to the process-specific proc and them 
symlinked. Take a look at "/proc/self".

>
> * Memory shrinks acquired from kswap.
>    As all dquot are cached in memory, and if the user executing quotaoff, maybe
>    I need to handle quota disable but still be kept at memory.
>    Also, add another routine to disable and remove all quotas from memory to
>    save memory directly.

I didn't read your patches yet, so take it with a grain of salt here. 
But I don't understand why you make this distinction of keeping it in 
memory only.

You could keep quota files outside of the container, and then bind mount 
them to the current location in the setup-phase.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ