linux-kernel - Re: [PATCH] eventfs: Have inodes have unique inode numbers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8547159a-0b28-4d75-af02-47fc450785fa@efficios.com>
Date: Fri, 26 Jan 2024 17:14:12 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
 Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
 Linux Trace Devel <linux-trace-devel@...r.kernel.org>,
 Masami Hiramatsu <mhiramat@...nel.org>,
 Christian Brauner <brauner@...nel.org>, Ajay Kaher
 <ajay.kaher@...adcom.com>, Geert Uytterhoeven <geert@...ux-m68k.org>,
 linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] eventfs: Have inodes have unique inode numbers

On 2024-01-26 16:49, Linus Torvalds wrote:
> On Fri, 26 Jan 2024 at 13:36, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
[...]
> So please try to look at things to *fix* and simplify, not at things
> to mess around with and make more complicated.

Hi Linus,

I'm all aboard with making things as simple as possible and
making sure no complexity is added for the sake of micro-optimization
of slow-paths.

I do however have a concern with the approach of using the same
inode number for various files on the same filesystem: AFAIU it
breaks userspace ABI expectations. See inode(7) for instance:

        Inode number
               stat.st_ino; statx.stx_ino

               Each  file in a filesystem has a unique inode number.  Inode numbers
               are guaranteed to be unique only within a filesystem (i.e., the same
               inode  numbers  may  be  used by different filesystems, which is the
               reason that hard links may not cross filesystem  boundaries).   This
               field contains the file's inode number.

So user-space expecting inode numbers to be unique within a filesystem
is not "legacy" in any way. Userspace is allowed to expect this from the
ABI.

I think that a safe approach to prevent ABI regressions, and just to prevent
adding more ABI-corner cases that userspace will have to work-around, would
be to issue unique numbers to files within eventfs, but in the
simplest/obviously correct implementation possible. It is, after all, a slow
path.

The issue with the atomic_add_return without any kinds of checks is the
scenarios of a userspace loop that would create/delete directories endlessly,
thus causing inode re-use. This approach is simple, but it's unfortunately
not obviously correct. Because eventfs allows userspace to do mkdir/rmdir,
this is unfortunately possible. It would be OK if only the kernel had control
over directory creation/removal, but it's not the case here.

I would suggest this straightforward solution to this:

a) define a EVENTFS_MAX_INODES (e.g. 4096 * 8),

b) keep track of inode allocation in a bitmap (within a single page),

c) disallow allocating more than "EVENTFS_MAX_INODES" in eventfs.

This way even the mkdir/rmdir loop will work fine, but it will prevent
keeping too many inodes alive at any given time. The cost is a single
page (4K) per eventfs instance.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com