linux-kernel - Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] : ima-appraisal patches)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFzuTspDyyLaOA-g-dTWydaUeeWo9uVGR+rZ=ZJzPW_Ocw@mail.gmail.com>
Date:	Fri, 20 Apr 2012 10:21:35 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Al Viro <viro@...iv.linux.org.uk>
Cc:	linux-fsdevel@...r.kernel.org, James Morris <jmorris@...ei.org>,
	linux-security-module@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	David Safford <safford@...ux.vnet.ibm.com>,
	Dmitry Kasatkin <dmitry.kasatkin@...el.com>,
	Mimi Zohar <zohar@...ux.vnet.ibm.com>,
	David Miller <davem@...emloft.net>
Subject: Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] :
 ima-appraisal patches)

On Fri, Apr 20, 2012 at 9:42 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
>
> Actually, I like the per-CPU spinlock variant better; the thing is,
> with that scheme we get normal fput() (i.e. non-nodefer variant)
> non-blocking.  How about this:

What's the advantage of a per-cpu lock?

If you make the work be per-cpu, then you're better with no locking at
all: just disable interrupts (which you do anyway).

And if you want to use a spinlock, don't bother with the percpu side.

The thing I do not like about the schedule_work approach is that it
(a) totally hides the real cost  - which is the scheduling - and (b)
it is so asynchronous that it will happen potentially long after the
task dropped the reference.

And seriously - that is user-visible behavior.

For example, think about this *common* pattern:

  open+mmap+close+unlink+munmap

which would trigger the whole deferred fput, but also triggers the
actual real unlink() at fput time.

Right now, you can have that kind of thing in a program and
immediately unmount the filesystem afterwards (replace "unmount" with
"cannot see silly-renamed files" etc).

The "totally asynchronous deferral" literally *breaks*semantics*.

Sure, it won't be noticeable in 99.99% of all cases, and I doubt you
can trigger much of a test for it. But it's potential real breakage,
and it's going to be hard to ever see. And then when it *does* happen,
it's going to be totally impossible to debug.

It's not just the "last unlink" thing that gets delayed. It things
like file locking. It's "drop_file_write_access()". It's whatever
random thing that file does at "release()". It's a ton of things like
that. Delaying them has user-visible actions.

That's a whole can of complexities and worries outside of the kernel
interface that you are completely ignoring - just because you are
trying to solve the *simple* complexity of locking interaction
entirely within the kernel.

I think that's a bit myopic. We don't even *know* what the problems
with the async approach might be. Your "simple" solution is simple
only inside the kernel.

This is why I suggested you look at Oleg's patches. If we guarantee
that things won't be delayed past re-entering user mode, all those
issues go away.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/