linux-kernel - Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFzzTPBhg__P597uh70fezshqS5JscGwkw21anRg7_bwwA@mail.gmail.com>
Date:	Sun, 17 May 2015 19:56:26 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	NeilBrown <neilb@...e.de>
Cc:	Al Viro <viro@...iv.linux.org.uk>,
	Andreas Dilger <adilger@...ger.ca>,
	Dave Chinner <david@...morbit.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Sun, May 17, 2015 at 4:16 PM, NeilBrown <neilb@...e.de> wrote:
>
> Just to be crystal clear about what I want:
>   I want the filesystem to be in control

Yeah, no. Not going to happen.

You seem to think that the dcache is "just" a cache. It's not. It's a
cache, but that is absolutely not all that it is. It's very much a
cache with strong semantics.

And no, we're not handing over those semantics over to the filesystem.
The dcache is not just a cache, it's the *primary* data structure that
we use for pathname validation, local security checking, and for doing
things like "getcwd()" and handling ".." etc.

So there's no way the filesystem is "in control". You as a filesystem
are not really even doing the actual pathname lookup. The *only* thing
you're doing is filling in the dcache. The actual real pathname lookup
is done by the VFS layer using the dcache data.

That's how it very fundamentally works.  It's *so* much more than a
cache - it really *is* the primary path lookup. The filesystem is the
slave in this relationship.

> The filesystem then uses generic helpers (or not) to find the answers and adds
> more current information to the cache.

You can do that already. There *are* those generic helpers to add data
to the cache. That's what "d_instantiate()" and friends _are_ for.

But no, you do *not* control name lookup. You get notified when
there's not enough data in the cache, and then you can fill it up any
which way you want.

You can populate the dcache with other entries than the one we asked
for, and you can ask the dcache to revalidate and throw dentries out.

But no, you do *not* get access to things like do_last() or to the
decision to follow symlinks or namespace rules, or mountpoints or
things like that.

> So for Al's example of revalidating multiple components at once, once the VFS
> gets to a point in the path where  d_revalidate says "I need more time",
> the VFS just passes the rest of the path to the filesystem.

That's bullshit,. for a very simple and basic reason: "the rest of the
path" is not necessarily at all for your filesystem!

Really. There might be mount-points, there might be symlinks, there
might be tons of stuff like that.

You're not getting control, for the very simple reason that IT IS NOT
YOUR DATA. And it really never ever will be.

Now, this is why I said we can do a "hint" style thing. Part of that
"hint" issue is very very much that it has no semantic meaning. You
can't screw it up, because if it turns out that the path component
we're looking up is a symlink and we actually end up in some other
filesystem, if you end up looking up the hint part, it just would
never actually get used.

So it's kind of like a prefetch for names. It's semantically much
weaker than saying "look up this name". The hint would be "this is
likely the next part of the name that the VFS layer will look up".

And the key part of that statement is
 (a) "likely" (it might not happen, and even if it does happen, it
migth not be for your filesystem)
and
 (b) "the VFS layer will look up" because it won't be the low-level
filesystem doing it.

So it would be the low-level filesystem pre-populating the dcache - if
the low-level filesystem decides the hint is worth using for that -
and the VFS layer then uses the data in the dcache without further
bothering the filesystem.

Exactly because the dcache is *so* much more than "just a cache".

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/