linux-ext4 - Re: [PATCH v2 0/4] quota: add project quota support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140811134836.GA3506@thunk.org>
Date:	Mon, 11 Aug 2014 09:48:36 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Li Xi <pkuelelixi@...il.com>
Cc:	Shuichi Ihara <sihara@....com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
	"hch@...radead.org" <hch@...radead.org>, Jan Kara <jack@...e.cz>,
	Andreas Dilger <adilger@...ger.ca>,
	"Niu, Yawei" <yawei.niu@...el.com>
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Mon, Aug 11, 2014 at 06:23:53PM +0800, Li Xi wrote:
> As a distributed file system, Lustre is able to use hundreds of seperate
> ext4 file systems to store its data as well as metadata, yet provides a
> united global name space. Some of users start to use SSD devices for better
> performance on Lustre. However as we can expect, they might want to replace
> only part of the drivers to SSD, since SSD is expensive. That means, part
> of the ext4 file systems are using SSD and the other part of the ext4 file
> systems are using hard disks. In the sight of Lustre, users can choose to
> locate files on SSDs or hard disks using features of Lustre, namely 'stripe'
> and 'OST pool'. Here comes the problem, how to limit the usage of SSD since
> all end users want good performance badly?

Ext4 quotas are per-disk, and storage technologies are per disk.  So
if *I* were designing a clustered file system, and we had different
cost centers, say, "mail", and "maps", "social", and "search", each of
which might have differnt amounts disk drive and SSD space, which
might be based on how much SSD each of the product area budgets are
willing to pay, and what the requires of each of the products might
be, I'd simply assign different groups to each of these cost centers.

For the purposes of usages of clustered file systems, you don't want
to do quota enforcement.  If you've spent tens or hundreds of CPU
years working on some distributed computation, you don't want to throw
it all away due to a quota failure.  Or if you are running an
international web-based service, causing a even a partial downtime of
everyone's maps or e-mail due to quota failure is also considered,
well, not cool.

So let's assume that you're only doing usage tracking, but even if you
wanted to do usage control, the files will be scattered across many
different servers and file systems, and so it doesn't make sense to do
quota control, or even usage tracking, on a disk by disk basis.

Hence, the clustered file system will have to sum up the usage quotas
of every each underlying file system, with different sums for the
HDD's and SSD's, by group.  Fortunately, Map Reduce is your friend.

Then for each group the cluster file system can report usage of HDD
and SSD space and inodes, separately.  When a project gets within a
few terabytes of being filled, or the overall free space in the
cluster drops below a few petabytes, you page the your SRE or devops
team so they can take care of things, perhaps by negotiating an
emergency quota increase, or moving files around, or deleting old
files, etc.

The bottom line is that you *can* run an exabyte+ cluster file system
supporting many different budget/cost centers with only group-level
quotas and nothing else.  And you can do this even supporting both
HDD's and SSD's, with separate quota tracking of the two storage
technologies.

Can you go into more detail about how Lustre would use project quotas
from a the cluster file system centric perspective, such as I've
sketched out above?

> Of course, we might be able to find some walk-around ways using group quota.
> However, because the owners of the files can change the group attributes
> freely, it is so easy for the users to evade the group quota and steal the
> tight resources.

But all of the users will be sending chgrp request through Lustre, or
whatever the cluster file system is.  So Lustre can enforce whatever
permissions policy it would like.

> For example, in order to steal SSD space, a user can just
> creating the files using the sepcific group ID and then change it back.

But since you've been arguing that the project id should get preserved
across renames, they can evade quota usage by doing:

	 touch /product/mail/huge_file
	 mv  /product/mail/huge_file /product/maps

And if you allow the rename, and allow the project id to be preserved
across renames, then the quota evasion is just as easy.  And yes, you
could prevent renames at the cluster file system level.  But the
question remains what makes sense on a single disk system, and if
users can trivially subvert the project quota by creating the file in
one directory, where it inherits the quota of project A, and then be
able to move the file to another directory, they have evaded quota
enforcement just as surely if they used chgrp.

Hence, to prevent this, you need to restrict administrator changes to
the superuser, *and* not allow renames across project hierarchies.
And surprise!  That looks exactly what XFS has built.

Cheers,

       		     	       	       	       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html