linux-kernel - Re: [RFC 0/7] [RFC] cramfs: fake write support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200806021315.41211.arnd@arndb.de>
Date:	Mon, 2 Jun 2008 13:15:40 +0200
From:	Arnd Bergmann <arnd@...db.de>
To:	hooanon05@...oo.co.jp
Cc:	Jamie Lokier <jamie@...reable.org>,
	Phillip Lougher <phillip@...gher.demon.co.uk>,
	David Newall <davidn@...idnewall.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	hch@....de
Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support

On Monday 02 June 2008, hooanon05@...oo.co.jp wrote:
> > * data inconsistency problems when simultaneously accessing the underlying
> >   fs and the union.
> Aufs has three levels of detecting the direct-access to the lower
> (branch) filesystems (ie. bypassing aufs). I guess the most strict level
> is a good answer for your question. It is based on the inotify
> feature. Aufs sets inotify-watch to every accessed directories on lower
> fs. During those inodes are cached, aufs receives the inotify event for
> thier children/files and marks the aufs data for the file is
> obsoleted. When the file is accessed later, aufs retrives the latest
> inode (or dentry) again.
> The inotify-watch will be removed when the aufs dir inode is discarded
> from cache.

This is a very complicated approach, and I'm not sure if it even addresses
the case where you have a shared mmap on both files. With VFS based union
mounts, they share one inode, so you don't need to use idiotify in the first
place, and it automatically works on shared mmaps.

> > * duplication of dentry and inode data structures in the union wastes
> >   memory and cpu cycles.
> 
> Aufs has its own dentry and inode object as normal fs has. And they have
> pointers to the corresponding ones on the lower fs. If you make a union
> from two real filesystems, then aufs inode will have (at most) two
> pointers as its private data.
> Do you mean having pointers is a duplicataion?

I mean having your own dentry and inode object is duplication. The
underlying file system already has them, so if you have your own,
you need to keep them synchronized. I guess that in order to do
a lookup on a file, you need the steps of

1. lookup in aufs dentry cache -> fail
2. lookup in underlying dentry cache -> fail
3. try to read dentry from disk -> fail
4. repeat 2-3 until found, or arrive at lowest level 
5. create an inode in memory for the lower file system
6. create dentry in memory on lower file system, pointing
   to that
7. create an aufs specific inode pointing to the underlying
   inode
8. create an aufs specific dentry object to point to that
9. create a struct inode representing the aufs inode
10. create another VFS dentry to point to that

when you really should just return the dentry found by the
lower file system.

> > * whiteouts are in the same namespace as regular files, so conflicts are
> >   possible.
> 
> Yes, that's right.
> Aufs reserves ".wh." as a whiteout prefix, and prohibits users to handle
> such filename inside aufs. It might be a problem as you wrote, but users
> can create/remove them directly on the lower fs and I have never
> received request about this reserved prefix.

It's not so much a practical limitation as an exploitable feature.
E.g. an unpriviledged user may use this to get an application into
an error condition by asking for an invalid file name.

Posix reserves a well-defined set of invalid file names, and
deviation from this means that you are not compliant, and that
in a potentially unexpected way.

> > * mounting a large number of aufs on top of each other eventually
> >   overflows the kernel stack, e.g. in readdir.
> 
> Aufs readdir operation consumes memory, but it is not stack. If it was
> implemented as a recursive function, it might cause the stack
> overflow. But actually it is a loop.
> The memory is used for stroing entry names and eliminating whiteout-ed
> ones, and the result will be cached for a specified time. So the memory
> (other than stack) will be consumed.

How does aufs know that one of its branches is an aufs itself?
If you detect this, do you fold it into a single aufs instance with
more branches?
In case you don't do it, I don't see how you get around the stack
overflow, but if you do it, you have again added a whole lot of
complexity for something that should be trivial when done right.

> > * allowing multiple writable branches (instead of just stacking
> >   one rw copy on a number of ro file systems) is confusing to the user
> >   and complicates the implementation a lot.
> 
> Probably you are right. Initially aufs had only one policy to select the
> writable branch. But several users requested another policy such as
> round-robin or most-free-spece, and aufs has implemented them.
> I don't guess uers will be confused by these policies. While I tried it
> should be simple, I guess some people will say it is complex.

I personally think that a policy other than writing to the top is crazy
enough, but randomly writing to multiple places is much worse, as it
becomes unpredictable what the file system does, not just unexpected.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/