linux-kernel - Re: [PATCH v3 03/10] eventfs: adding eventfs dir add functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230703110857.2d051af5@rorschach.local.home>
Date:   Mon, 3 Jul 2023 11:08:57 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Ajay Kaher <akaher@...are.com>
Cc:     "mhiramat@...nel.org" <mhiramat@...nel.org>,
        "shuah@...nel.org" <shuah@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-trace-kernel@...r.kernel.org" 
        <linux-trace-kernel@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        Ching-lin Yu <chinglinyu@...gle.com>,
        Nadav Amit <namit@...are.com>,
        "srivatsa@...il.mit.edu" <srivatsa@...il.mit.edu>,
        Alexey Makhalov <amakhalov@...are.com>,
        Vasavi Sirnapalli <vsirnapalli@...are.com>,
        Tapas Kundu <tkundu@...are.com>,
        "er.ajay.kaher@...il.com" <er.ajay.kaher@...il.com>
Subject: Re: [PATCH v3 03/10] eventfs: adding eventfs dir add functions

On Mon, 3 Jul 2023 10:13:22 +0000
Ajay Kaher <akaher@...are.com> wrote:

> >> +/**
> >> + * eventfs_down_write - acquire write lock function
> >> + * @eventfs_rwsem: a pointer to rw_semaphore
> >> + *
> >> + * helper function to perform write lock on eventfs_rwsem
> >> + */
> >> +static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem)
> >> +{
> >> +     while (!down_write_trylock(eventfs_rwsem))
> >> +             msleep(10);  
> >
> > What's this loop for? Something like that needs a very good explanation
> > in a comment. Loops like these are usually a sign of a workaround for a
> > bug in the design, or worse, simply hides an existing bug.
> >  
> 
> Yes correct, this logic is to solve deadlock:
> 
> Thread 1                             Thread 2
> down_read_nested()                                 - read lock acquired
>                                          down_write()     - waiting for write lock to acquire
> down_read_nested()                                  - deadlock
> 
> Deadlock is because rwlock wouldn’t allow read lock to be acquired if write lock is waiting.
> down_write_trylock() wouldn’t add the write lock in waiting queue, hence helps to prevent
> deadlock scenario.
> 
> I was stuck with this Deadlock, tried few methods and finally borrowed from cifs, as it’s
> upstreamed, tested and working in cifs, please refer:
> https://elixir.bootlin.com/linux/v6.3.1/source/fs/cifs/file.c#L438

I just looked at that code and the commit, and I honestly believe that
is a horrible hack, and very fragile. It's in the smb code, so it was
unlikely reviewed by anyone outside that subsystem. I really do not
want to prolificate that solution around the kernel. We need to come up
with something else.

I also think it's buggy (yes the cifs code is buggy!) because in the
comment above the down_read_nested() it says:

/*
 * nested locking. NOTE: rwsems are not allowed to recurse
 * (which occurs if the same task tries to acquire the same
 * lock instance multiple times), but multiple locks of the
 * same lock class might be taken, if the order of the locks
 * is always the same. This ordering rule can be expressed
 * to lockdep via the _nested() APIs, but enumerating the
 * subclasses that are used. (If the nesting relationship is
 * static then another method for expressing nested locking is
 * the explicit definition of lock class keys and the use of
 * lockdep_set_class() at lock initialization time.
 * See Documentation/locking/lockdep-design.rst for more details.)
 */

So this is NOT a solution (and the cifs code should be fixed too!)

Can you show me the exact backtrace where the reader lock gets taken
again? We will have to come up with a way to not take the same lock
twice.

We can also look to see if we can implement this with RCU. What exactly
is this rwsem protecting?


> 
> Looking further for your input. I will add explanation in v4.
> 
> 
> >> +}
> >> +

[..]

> >> + *
> >> + * This function creates the top of the trace event directory.
> >> + */
> >> +struct dentry *eventfs_create_events_dir(const char *name,
> >> +                                      struct dentry *parent,
> >> +                                      struct rw_semaphore *eventfs_rwsem)  
> >
> > OK, I'm going to have to really look at this. Passing in a lock to the
> > API is just broken. We need to find a way to solve this another way.  
> 
> eventfs_rwsem is a member of struct trace_array, I guess we should
> pass pointer to trace_array.

No, it should not be part of the trace_array. If we can't do this with
RCU, then we need to add a descriptor that contains the dentry that is
returned above, and have the lock held there. The caller of the
eventfs_create_events_dir() should not care about locking. That's an
implementation detail that should *not* be part of the API.

That is, if you need a lock:

struct eventfs_dentry {
	struct dentry		*dentry;
	struct rwsem		*rwsem;
};

And then get to that lock by using the container_of() macro. All
created eventfs dentry's could have this structure, where the rwsem
points to the top one. Again, that's only if we can't do this with RCU.

-- Steve


> 
> 
> > I'm about to board a plane to JFK shortly, I'm hoping to play with this
> > while flying back.
> >  
> 
> I have replied for major concerns. All other minor I will take care in v4.
> 
> Thanks a lot for giving time to eventfs patches.
> 
> - Ajay
>