linux-kernel - Re: [PATCH 5/6] eventfs: get rid of dentry pointers without refcounts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whSse54d+X361K2Uw6jO2SvrO-U0LHLBTLnqHaA+406Fw@mail.gmail.com>
Date: Tue, 30 Jan 2024 22:20:24 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>, Al Viro <viro@...iv.linux.org.uk>
Cc: Masami Hiramatsu <mhiramat@...nel.org>, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH 5/6] eventfs: get rid of dentry pointers without refcounts

On Tue, 30 Jan 2024 at 21:57, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Ugh.

Oh, and double-ugh on that tracefs_syscall_mkdir/rmdir(). I hate how
it does that "unlock and re-lock inode" thing.

It's a disease, and no, it's not an acceptable response to "lockdep
shows there's a locking problem".

The comment says "It is up to the individual mkdir routine to handle
races" but that's *completely* bogus and shows a fundamental
misunderstanding of locking.

Dropping the lock in the middle of a locked section does NOT affect
just the section that you dropped the lock around.

It affects the *caller*, who took the lock in the first place, and who
may well assume that the lock protects things for the whole duration
of the lock.

And so dropping the lock in the middle of an operation is a bad BAD
*BAD* thing to do.

Honestly, I had only been looking at the eventfs_inode.c lookup code.
But now that I started looking at mkdir/rmdir, I'm seeing the same
signs of horrible mistakes there too.

And yes, it might be a real problem. For example, for the rmdir case,
the actual lock was taken by 'vfs_rmdir()', and it does *not* protect
only the ->rmdir() call itself.

It also, for example, is supposed to make the ->rmdir be atomic wrt things like

        dentry->d_inode->i_flags |= S_DEAD;

and

        dont_mount(dentry);
        detach_mounts(dentry);

but now because the inode lock was dropped in the middle, for all we
know a "mkdir()" could come in on the same name, and make a mockery of
the whole inode locking.

The inode lock on that directory that is getting removed is also what
is supposed to make it impossible to add new files to the directory
while the rmdir is going on.

Again, dropping the lock violates those kinds of guarantees.

"git blame" actually fingers Al for that "unlock and relock", but
that's because Al did a search-and-replace on
"mutex_[un]lock(&inode->i_mutex)" and replaced it with
"inode_[un]lock(inode)" back in 2016.

The real culprit is you, and that sh*t-show goes back to commit
277ba04461c2 ("tracing: Add interface to allow multiple trace
buffers").

Christ. Now I guess I need to start looking if there is anything else
horrid lurking in that mkdir/rmdir path.

I doubt the inode locking problem ends up mattering in this situation.
Mostly since this is only tracefs, and normal users shouldn't be able
to access these things anyway. And I think (hope?) that you only
accept mkdir/rmdir in specific places.

But this really is another case of "This code smells *fundamentally* wrong".

Adding Al, in the hopes that he will take a look at this too.

            Linus