linux-kernel - Re: [PATCH v2] fs: try an opportunistic lookup for O

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0677c60eb1f47eb186f3e5493ba5aa7e0eaa445.camel@kernel.org>
Date: Thu, 08 Aug 2024 19:43:26 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Paul Moore <paul@...l-moore.com>, Jan Kara <jack@...e.cz>
Cc: Christian Brauner <brauner@...nel.org>, Alexander Viro
 <viro@...iv.linux.org.uk>, Andrew Morton <akpm@...ux-foundation.org>, 
 Mateusz Guzik <mjguzik@...il.com>, Josef Bacik <josef@...icpanda.com>,
 linux-fsdevel@...r.kernel.org,  linux-kernel@...r.kernel.org,
 audit@...r.kernel.org
Subject: Re: [PATCH v2] fs: try an opportunistic lookup for O_CREAT opens too

On Thu, 2024-08-08 at 17:12 -0400, Paul Moore wrote:
> On Thu, Aug 8, 2024 at 1:11 PM Jan Kara <jack@...e.cz> wrote:
> > On Thu 08-08-24 12:36:07, Christian Brauner wrote:
> > > On Wed, Aug 07, 2024 at 10:36:58AM GMT, Jeff Layton wrote:
> > > > On Wed, 2024-08-07 at 16:26 +0200, Christian Brauner wrote:
> > > > > > +static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag)
> > > > > > +{
> > > > > > +       struct dentry *dentry;
> > > > > > +
> > > > > > +       if (open_flag & O_CREAT) {
> > > > > > +               /* Don't bother on an O_EXCL create */
> > > > > > +               if (open_flag & O_EXCL)
> > > > > > +                       return NULL;
> > > > > > +
> > > > > > +               /*
> > > > > > +                * FIXME: If auditing is enabled, then we'll have to unlazy to
> > > > > > +                * use the dentry. For now, don't do this, since it shifts
> > > > > > +                * contention from parent's i_rwsem to its d_lockref spinlock.
> > > > > > +                * Reconsider this once dentry refcounting handles heavy
> > > > > > +                * contention better.
> > > > > > +                */
> > > > > > +               if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context())
> > > > > > +                       return NULL;
> > > > > 
> > > > > Hm, the audit_inode() on the parent is done independent of whether the
> > > > > file was actually created or not. But the audit_inode() on the file
> > > > > itself is only done when it was actually created. Imho, there's no need
> > > > > to do audit_inode() on the parent when we immediately find that file
> > > > > already existed. If we accept that then this makes the change a lot
> > > > > simpler.
> > > > > 
> > > > > The inconsistency would partially remain though. When the file doesn't
> > > > > exist audit_inode() on the parent is called but by the time we've
> > > > > grabbed the inode lock someone else might already have created the file
> > > > > and then again we wouldn't audit_inode() on the file but we would have
> > > > > on the parent.
> > > > > 
> > > > > I think that's fine. But if that's bothersome the more aggressive thing
> > > > > to do would be to pull that audit_inode() on the parent further down
> > > > > after we created the file. Imho, that should be fine?...
> > > > > 
> > > > > See https://gitlab.com/brauner/linux/-/commits/vfs.misc.jeff/?ref_type=heads
> > > > > for a completely untested draft of what I mean.
> > > > 
> > > > Yeah, that's a lot simpler. That said, my experience when I've worked
> > > > with audit in the past is that people who are using it are _very_
> > > > sensitive to changes of when records get emitted or not. I don't like
> > > > this, because I think the rules here are ad-hoc and somewhat arbitrary,
> > > > but keeping everything working exactly the same has been my MO whenever
> > > > I have to work in there.
> > > > 
> > > > If a certain access pattern suddenly generates a different set of
> > > > records (or some are missing, as would be in this case), we might get
> > > > bug reports about this. I'm ok with simplifying this code in the way
> > > > you suggest, but we may want to do it in a patch on top of mine, to
> > > > make it simple to revert later if that becomes necessary.
> > > 
> > > Fwiw, even with the rearranged checks in v3 of the patch audit records
> > > will be dropped because we may find a positive dentry but the path may
> > > have trailing slashes. At that point we just return without audit
> > > whereas before we always would've done that audit.
> > > 
> > > Honestly, we should move that audit event as right now it's just really
> > > weird and see if that works. Otherwise the change is somewhat horrible
> > > complicating the already convoluted logic even more.
> > > 
> > > So I'm appending the patches that I have on top of your patch in
> > > vfs.misc. Can you (other as well ofc) take a look and tell me whether
> > > that's not breaking anything completely other than later audit events?
> > 
> > The changes look good as far as I'm concerned but let me CC audit guys if
> > they have some thoughts regarding the change in generating audit event for
> > the parent. Paul, does it matter if open(O_CREAT) doesn't generate audit
> > event for the parent when we are failing open due to trailing slashes in
> > the pathname? Essentially we are speaking about moving:
> > 
> >         audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> > 
> > from open_last_lookups() into lookup_open().
> 
> Thanks for adding the audit mailing list to the CC, Jan.  I would ask
> for others to do the same when discussing changes that could impact
> audit (similar requests for the LSM framework, SELinux, etc.).
> 
> The inode/path logging in audit is ... something.  I have a
> longstanding todo item to go revisit the audit inode logging, both to
> fix some known bugs, and see what we can improve (I'm guessing quite a
> bit).  Unfortunately, there is always something else which is burning
> a little bit hotter and I haven't been able to get to it yet.
> 

It is "something" alright. The audit logging just happens at strange
and inconvenient times vs. what else we're trying to do wrt pathwalking
and such. In particular here, the fact __audit_inode can block is what
really sucks.

Since we're discussing it...

ISTM that the inode/path logging here is something like a tracepoint.
In particular, we're looking to record a specific set of information at
specific points in the code. One of the big differences between them
however is that tracepoints don't block.  The catch is that we can't
just drop messages if we run out of audit logging space, so that would
have to be handled reasonably.

I wonder if we could leverage the tracepoint infrastructure to help us
record the necessary info somehow? Copy the records into a specific
ring buffer, and then copy them out to the audit infrastructure in
task_work?

I don't have any concrete ideas here, but the path/inode audit code has
been a burden for a while now and it'd be good to think about how we
could do this better.

> The general idea with audit is that you want to record the information
> both on success and failure.  It's easy to understand the success
> case, as it is a record of what actually happened on the system, but
> you also want to record the failure case as it can provide some
> insight on what a process/user is attempting to do, and that can be
> very important for certain classes of users.  I haven't dug into the
> patches in Christian's tree, but in general I think Jeff's guidance
> about not changing what is recorded in the audit log is probably good
> advice (there will surely be exceptions to that, but it's still good
> guidance).
> 

In this particular case, the question is:

Do we need to emit a AUDIT_INODE_PARENT record when opening an existing
file, just because O_CREAT was set? We don't emit such a record when
opening without O_CREAT set.

-- 
Jeff Layton <jlayton@...nel.org>