lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260204062614.GK3183987@ZenIV>
Date: Wed, 4 Feb 2026 06:26:14 +0000
From: Al Viro <viro@...iv.linux.org.uk>
To: Waiman Long <llong@...hat.com>
Cc: Paul Moore <paul@...l-moore.com>, Eric Paris <eparis@...hat.com>,
	Christian Brauner <brauner@...nel.org>,
	linux-kernel@...r.kernel.org, audit@...r.kernel.org,
	Richard Guy Briggs <rgb@...hat.com>,
	Ricardo Robaina <rrobaina@...hat.com>
Subject: Re: [PATCH v2] audit: Avoid excessive dput/dget in audit_context
 setup and reset paths

On Tue, Feb 03, 2026 at 11:21:23PM -0500, Waiman Long wrote:

> Interesting. So are you thinking about a reference on the pwd inside the
> fs_struct?

No.  fs_struct has pwd pinned - both dentry and mount.

Each thread has a reference to fs_struct instance in its task_struct (task->fs).

It is possible for several threads to have their task->fs pointing to the
same instance.

A thread may modify its fs_struct pointer, making it point to a different
fs_struct instance; that happens on unshare(2).  In that case new instance
is an identical copy of the original one; reference counts of mounts and dentries
(both for pwd and root) are incremented to account for additional references
in the copy.  That assignment to current->fs happens under task_lock(current).

A child gets either a reference to identical copy of parent's fs_struct or
an extra reference to parent's fs_struct.  In either case child->fs is
set before the child gets to run; as the matter of fact, that happens before
anyone besides the parent might observe the task_struct of child.

That's it - no other stores to task_struct.fs are ever allowed; no other
thread can change your reference under you.


Other than initializing a new copy, *all* stores to fs_struct.pwd are done
to current->fs.pwd; access to any other fs_struct instances is read-only.
The only way another thread might change your pwd is if that thread shares
fs_struct with you.  They can observe its contents, as long as they take
care to hold task_lock(your_thread), but no more than that.

What's more, if current->fs is the sole reference to fs_struct instance,
it will remain such until you spawn a child and set it to share your
instance (CLONE_FS).  No other thread can gain extra references to it,
etc.


What it means is that if you are holding the sole reference to current->fs,
current->fs->pwd contents will remain unchanged (and pinning the same mount
and dentry) through the entire syscall with very few exceptions:
	1) you spawn a child with CLONE_FS - that has to be either
clone(2) or io_uring_setup(2) (the latter - via worker threads).
	2) you explicitly modify your pwd - in chdir(2), fchdir(2),
pivot_root(2), setns(2) (with CLONE_NEWNS; it switches you to the root of
namespace you've joined)

As long as these exceptions are accounted for, audit could check
current->fs->users *and* skip incrementing refcounts if it's equal to 1.
It would need to remember whether these reference had been taken -
rechecking condition won't work, since other threads might by gone
by the time we leave the syscall turning "shared" to "not shared".

That way we can avoid grabbing any references in a fairly common
case.

> I am thinking about instead of getting a reference to pwd.dentry
> and pwd.mnt, we can just get a reference to the pwd itself. We have to grab
> the fs lock in get_fs_pwd() anyway.

HUH?  pwd is a path_struct; it has no refcount of its own.  And an
extra reference to fs_struct will not prevent modifications by other
threads that happens to share that fs_struct.

> This spinlock doesn't show up as being
> contended in the customer bug report as each process can has its own
> fs_struct instead of many processing sharing the same working directory.

<wry> I sincerely hope that you don't mean to hold current->fs->lock through the
entire syscall. </wry>

> In the case of audit, fs->pwd is copied out at the beginning of the open*
> system call and then put back at the end of that system call. So it should
> last a pretty short time, i.e. the reference will be released
> shortly. However, that will make the set_fs_pwd() call a bit tricky to
> implement, but I think it is still doable.

Huh?  You can't make fs_struct CoW, simply because if you clone a child with
CLONE_FS and do chdir() in it, it will change *your* current directory as
well (and vice versa).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ