linux-ext4 - Re: Atomic non-durable file write API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=ULuM6fHH1V2zKGpaSjKRrbUJen5oAMKAkAaei@mail.gmail.com>
Date:	Wed, 29 Dec 2010 10:09:48 +0100
From:	Olaf van der Spek <olafvdspek@...il.com>
To:	"Ted Ts'o" <tytso@....edu>
Cc:	Neil Brown <neilb@...e.de>,
	Christian Stroetmann <stroetmann@...olinux.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-ext4@...r.kernel.org, Nick Piggin <npiggin@...il.com>
Subject: Re: Atomic non-durable file write API

On Wed, Dec 29, 2010 at 12:42 AM, Ted Ts'o <tytso@....edu> wrote:
> On Tue, Dec 28, 2010 at 11:54:33PM +0100, Olaf van der Spek wrote:
>
>> > Very true.  But until such problems are described an understood,
>> > there is not a lot of point trying to implement a
>> > solution.  Premature implementation, like premature optimisation,
>> > is unlikely to be fruitful.  I know this from experience.
>>
>> The problems seem clear. The implications not yet.
>
> I don't think there's even agreement that it is a problem.  A problem

Maybe problem isn't the right word, but it does seem a cornercase / exception.

> implies a use case where where such a need is critical, and I haven't
> seen it yet.  I'd rather characeterize it as a demand for a "solution"
> for a problem that hasn't been proven to exist yet.
>
>> True, I don't understand why people say it will cause a performance
>> hit but then don't want to tell why.
>
> Because I don't want waste time doing a hypothetical design when (a)
> the specification space hasn't even been fully spec'ed out, and (b) no
> compelling use case has been demonstrated, and (c) no one is paying
> me.

> The last point is a critical one; who's going to do the work?  If you
> are going to do the work, then implement it and send us the patches.
> If you expect a technology expert to do the work, it's dirty pool to
> try force him or her do a design to "prove" that it's not trivial.
>
> If you're going to pay me $50,000 or $100,000, then it's on the golden
> rule principle (the customer with the gold, makes the rules), and I'll
> happily work on a design even if in my best judgment it's ill-advised,
> and probably will be a waste of money, because, hey, it's the
> customer's money.  But if you're going to ask me to spend my time
> working on something which in my professional opinion is a waste of
> time, and do it pro bono, you must be smoking something really good,
> and probably really illegal.

I don't want you to work on something you do not support.
I want to understand why you think it's a bad idea.

> Here are some of the hints though about trouble spots.
>
> 1) What happens in disk full cases?  Remember, we can't free the old
> inode until writeback has happened.  And if we haven't allocated space
> yet for the file, and space is needed for the new file, what happens?
> What if some other disk write needs the space?

I would expect a no space error.

> 2) How big are the files that you imagine should be supported with
> such a scheme?  If the file system is 1 GB, and the file is 600MG, and
> you want to replace it with new contents which is 750MB long, what
> happens?  How does the system degrade gracefully in the case of larger
> files?  Does the user get any notification that maybe the magic
> O_PONIES semantics might be changing?

No sementics will change, you'll get a no space error.
Just like you would if you use the temp file approach.

> 3) What if the rename is still pending, but in the mean time, some
> other process modifies the file?  Do those writes also have to be
> atomic vis-a-vis the rename?

So the rename has been executed already (but has not yet been comitted
to disk) and then the file is modified? They would apply to the new
file.

> 4) What if the rename is still pending, but in the meantime, some
> other process does another create a new file, and rename over the same
> file name?

The last update would win, if by pending you mean the rename has been
executed already but hasn't been written to disk yet.

> etc.
>
>> >> Where losing meta-data is bad? That should be obvious.
>>
>> In that case meta-data shouldn't be supported in the first place.
>
> Well, hold on a minute.  It depends on what the meta-data means.  If
> the meta-data is supposed to be a secure indication of who created the
> file, or more importantly if quotes are enforced, to whom the disk
> usage quota should be charged, then it might not be allowable to
> "preserve the metadata in some cases".

I understand you can't just allow chown, but ...

> In general, you can always save the meta data, and restore the meta
> data to the new file --- except when there are security reasons why
> this isn't allowed.  For example, file ownership is special, because
> of (a) setuid bit considerations, and (b) file quota considerations.
> If you don't have those issues, then allowing a non-privileged user to
> use chown() is perfectly acceptable.  But it's because of these issues
> that chown() is special.
>
> And if quota is enabled, replacing a 10MB file with a 6TB file, while
> preserving the same file "owner", and therefore charging the 6TB to
> the old owner, would be a total evasion of the quota system.

Isn't that already a problem if you have write access to a file you don't own?

Still waiting on an answer to:
> What is the recommended way for atomic (complete) file writes?

Given that (you say) so many get it wrong, it would be nice to know
the right way.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html