linux-kernel - Re: [rfc] new stat*fs-like syscall?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C2366F7.5010200@mit.edu>
Date:	Thu, 24 Jun 2010 10:08:55 -0400
From:	Andy Lutomirski <luto@....edu>
To:	Nick Piggin <npiggin@...e.de>
CC:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Al Viro <viro@...IV.linux.org.uk>,
	Ulrich Drepper <drepper@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [rfc] new stat*fs-like syscall?

Nick Piggin wrote:
> This has come up a few times in the past, and I'd like to try to get
> an agreement on it. statvfs(2) importantly contains f_flag (mount
> flags), and is encouraged to use rather than statfs(2). The kernel
> provides a statfs syscall only.
> 
> This means glibc has to provide f_flag support by parsing /proc/mounts
> and stat(2)ing mount points. This is really slow, and /proc/mounts is
> hard for the kernel to provide. It's actually the last scalability
> bottleneck in the core vfs for dbench (samba) after my patches.
> 
> Not only that, but it's racy.
> 
> Other than types, other differences are:
> - statvfs(2) has is f_frsize, which seems fairly useless.
> - statvfs(2) has f_favail.
> - statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
>   block size. The latter could be useful for disk space algorithms.
>   Both can be ill defned.
> - statvfs(2) lacks f_type.
> 
> Is there anything more we should add here? Samba wants a capabilities
> field, with things like sparse files, quotas, compression, encryption,
> case preserving/sensitive.
> 
> Any thoughts?

Something like fsid but actually specified to uniquely identify a 
superblock.  (Currently, fsid seems to be set by the filesystem, and 
nothing in particular ensures that two different filesystems couldn't 
have collisions.)  We could guarantee (or have a flag guaranteeing) that 
(fsid, st_inode) actually uniquely identifies an inode.

Similarly, something like fsid that uniquely identifies the vfsmount 
could be useful, although I don't know how easy that would be to provide 
for fstat?fs.

If we could expose the complete set of filesystem mount options so that 
mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then 
playing with chroots would be that much easier.

Should we expose superblock and vfsmount options separately?  We have 
read-only bind mounts now, but the way they work is rather inscrutable, 
and if stat?fs could say "superblock is read-write but vfsmount is 
readonly" then people might be able to make more sense of what's going on.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/