[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150316165228.GT4934@quack.suse.cz>
Date: Mon, 16 Mar 2015 17:52:28 +0100
From: Jan Kara <jack@...e.cz>
To: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
Cc: linux-fsdevel@...r.kernel.org, Dave Chinner <david@...morbit.com>,
Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
Theodore Ts'o <tytso@....edu>,
Dmitry Monakhov <dmonakhov@...nvz.org>,
Andy Lutomirski <luto@...capital.net>,
linux-kernel@...r.kernel.org, Li Xi <pkuelelixi@...il.com>
Subject: Re: [PATCH RFC v2 0/6] ext4: yet another project quota
Hello,
On Tue 10-03-15 20:22:04, Konstantin Khlebnikov wrote:
> Projects quota allows to enforce disk quota for several subtrees or even
> individual files on the filesystem. Each inode is marked with project-id
> (independently from uid and gid) and accounted into corresponding project
> quota. New files inherits project id from directory where they are created.
>
> This is must-have feature for deploying lightweight containers.
> Also project quota can tell size of subtree without doing recursive 'du'.
>
> This patchset adds project id and quota into ext4.
>
> This time I've prepared patches also for e2fsprogs and quota-tools.
>
> All patches are available at github:
> https://github.com/koct9i/linux --branch project
> https://github.com/koct9i/e2fsprogs --branch project
> https://github.com/koct9i/quota-tools --branch project
>
> Porposed behavior is similar to project quota in XFS:
>
> * all inode are marked with project id
> * new files inherits project id from parent directory
> * project quota accounts inodes and enforces limits
> * cross-project link and rename operations are restricted
Thanks for the patches and sorry for taking a long time to look into
them. I like your patches and they looks mostly fine to me but can we
*please* start with making things completely compatible with how XFS works?
I understand that may seem broken / not useful for your usecase but
consistency among filesystems is really important.
I was talking with Dave Chinner last week and we agreed that the direction
you are changing things makes sense but getting things compatible with how
they were for 15 years in XFS is an important first step (because there may
be people with other usecases depending on the old behavior and they would
get confused by ext4 behaving differently). After we have compatible code
in, we can add your fs.protected_projects thing on top of that (and also
support it in XFS to stay compatible).
> Differences:
>
> There is no flag similar to XFS_XFLAG_PROJINHERIT (which allows to disable
> project id inheritance), instead of that project which userspace sees as '0'
> (in nested user-name space that might be actually non-zero project) acts as
> default project where restrictions for link/rename are ignored.
> (also see below, in "why new ioctl" part)
>
> This implementation adds shortcut for moving files from one project into
> another: non-directory inodes with n_link == 1 are transferred without
> copying (XFS in this case returns -EXDEV and userspace have to copy file).
>
> In XFS file owner (or process with CAP_FOWNER) can set set any project id,
> XFS permits changing project id only from init user-namespace.
>
> This patchset adds sysctl fs.protected_projects. By default it's 0 and project
> id acts as XFS project. Setting it to 1 makes chaning project id priviliged
> operation which requires CAP_SYS_RESOURCE in current user-namespace, changing
> project id mapping for nested user-namespace also requires that capability.
> Thus there are two levels of control: project id mapping in user-ns defines set
> of permitted projects and capability protects operations within this set.
>
> I see no problems with supporting all this in XFS, all difference in interface.
>
> Ext4 layout
> -----------
>
> Project id introduce ro-compatible feature 'project'.
>
> Inode project id is stored in place of obsolete field 'i_faddr' (that trick was
> suggested by Darrick J. Wong in previous discussions of project quota).
> Linux never used that field and present fsck checks that it contains zero.
>
> Quota information is stored in special inode №11 (by default only 10 inodes are
> reserved for special usage, I've add option to resize2fs to reserve more).
> (see e2fsprogs patches for details) For symmetry with other quotas inode number
> is stored in superblock.
>
> Project quota supports only modern 'hidden' journaled mode.
>
> Interface
> ---------
>
> Interface for changing limits / query current usage is common vfs quotactl()
> where quotatype = PRJQUOTA = 2. User can query current state of any project
> mapped into user-ns, changing of limits requires CAP_SYS_ADMIN in init user-ns.
>
> Two new ioctls for getting / changing inode project id:
> int ioctl(fd, FS_IOC_GETPROJECT, unsigned *project);
> int ioctl(fd, FS_IOC_SETPROJECT, unsigned *project);
>
> They acts as interface for super-block methods get_project / set_project
> Generic code checks permissions, does project id translation in user-namespace
> mapping, grabs write-access to the filesystem, locks i_mutex for set opetaion.
> Filesystem method only updates inode and transfers project quota.
>
> No new mount options added. Disk usage tracking is enabled at mount.
> Limits are enabeld later by "quotaon".
>
> (BTW why journalled quota doesn't enable limits right at the time of mounting?)
Well, mostly historical reasons :) The biggest probably was that to
properly initialize quotas (or fix them if quota file gets corrupted) you
have to still run quotacheck and for that kernel mustn't touch the files.
At the time when I was writing journaled quotas I didn't feel like adding
quotacheck support into e2fsck... We eventually did that anyway for support
of quota in system files but how we did that was actually based on the
experience I had from the journaled quota lesson ;).
> Why new ioctls?
> ---------------
>
> XFS has it's own interface for that: XFS_IOC_FSGETXATTR / XFS_IOC_FSSETXATTR.
> But it has several flaws and doesn't fit for a role of generic interface.
>
> It contains a lot of xfs-specific flags and racy by design: set operation
> commits all fields at once thus it's used in sequence get-change-set without
> any lock, Concurrent updates from user space will collide.
Sure but that's the case for generic GETFLAGS / SETFLAGS interface as
well.
> Also xfs has flag XFS_XFLAG_PROJINHERIT which tells should new files inherit
> project id from parent directory or not. This flag is barely useful and only
> makes everything complicated. Even tools in xfsprogs don't use it: they always
> set it together with project id and clears when set project id back to zero.
I think this is about balance - adding the support for the PROJINHERIT flag
costs us close to nothing and we get a compatibility with XFS. So even though
the usefulness of that flag is doubtful, I think the additional
compatibility is worth it.
> And the main reason: this compatibility gives nothing. The only user of xfs
> ioctl which I've found is the xfsprogs. And these tools check filesystem name
> and don't work anywhere except 'xfs'.
The fact that you didn't find any other user doesn't mean there isn't
any. And historically if we though "nobody could be relying on this", we
were sometimes proven wrong - a few years later when it was *much* more
painful to make things compatible.
Here we speak about new feature for the filesystem so compatibility
requirements aren't that strong but still I find the case that the same
ioctl will just work on ext4 as it does on xfs more appealing than creating
a new simpler generic ioctl. That way userspace doesn't have to care on
which filesystem it is working or whether the current kernel already
supports the new ioctl for XFS. So please use the XFS interface is Li Xi
does in his patches.
Honza
> Links
> -----
>
> [1] 2014-12-09 ext4: add project quota support by Li Xi
> http://marc.info/?l=linux-fsdevel&m=141810265603565&w=2
>
> [2] 2014-01-28 A draft for making ext4 support project quota by Zheng Liu
> http://marc.info/?l=linux-ext4&m=139089109605795&w=2
>
> [3] 2012-07-09 introduce extended inode owner identifier v10 by Dmitry Monakhov
> http://thread.gmane.org/gmane.linux.file-systems/65752
>
> [4] 2010-02-08 Introduce subtree quota support by Dmitry Monakhov
> http://thread.gmane.org/gmane.comp.file-systems.ext4/17530
>
>
> ---
>
> Konstantin Khlebnikov (6):
> fs: vfs ioctls for managing project id
> fs: protected project id
> quota: generic project quota
> ext4: support project id and project quota
> ext4: add shortcut for moving files across projects
> ext4: mangle statfs results accourding to project quota usage and limits
>
>
> Documentation/filesystems/Locking | 4 +
> Documentation/filesystems/vfs.txt | 8 +++
> Documentation/sysctl/fs.txt | 16 ++++++
> fs/compat_ioctl.c | 2 +
> fs/ext4/ext4.h | 15 ++++-
> fs/ext4/ialloc.c | 3 +
> fs/ext4/inode.c | 15 +++++
> fs/ext4/namei.c | 102 ++++++++++++++++++++++++++++++++++++-
> fs/ext4/super.c | 61 ++++++++++++++++++++--
> fs/ioctl.c | 62 ++++++++++++++++++++++
> fs/quota/dquot.c | 96 +++++++++++++++++++++++++++++++++--
> fs/quota/quota.c | 8 ++-
> fs/quota/quotaio_v2.h | 6 +-
> include/linux/fs.h | 3 +
> include/linux/quota.h | 1
> include/linux/quotaops.h | 16 ++++++
> include/uapi/linux/capability.h | 1
> include/uapi/linux/fs.h | 3 +
> include/uapi/linux/quota.h | 6 +-
> kernel/sysctl.c | 9 +++
> kernel/user_namespace.c | 4 +
> 21 files changed, 416 insertions(+), 25 deletions(-)
>
> --
> Signature
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists