[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230703110857.2d051af5@rorschach.local.home>
Date: Mon, 3 Jul 2023 11:08:57 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Ajay Kaher <akaher@...are.com>
Cc: "mhiramat@...nel.org" <mhiramat@...nel.org>,
"shuah@...nel.org" <shuah@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-trace-kernel@...r.kernel.org"
<linux-trace-kernel@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
Ching-lin Yu <chinglinyu@...gle.com>,
Nadav Amit <namit@...are.com>,
"srivatsa@...il.mit.edu" <srivatsa@...il.mit.edu>,
Alexey Makhalov <amakhalov@...are.com>,
Vasavi Sirnapalli <vsirnapalli@...are.com>,
Tapas Kundu <tkundu@...are.com>,
"er.ajay.kaher@...il.com" <er.ajay.kaher@...il.com>
Subject: Re: [PATCH v3 03/10] eventfs: adding eventfs dir add functions
On Mon, 3 Jul 2023 10:13:22 +0000
Ajay Kaher <akaher@...are.com> wrote:
> >> +/**
> >> + * eventfs_down_write - acquire write lock function
> >> + * @eventfs_rwsem: a pointer to rw_semaphore
> >> + *
> >> + * helper function to perform write lock on eventfs_rwsem
> >> + */
> >> +static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem)
> >> +{
> >> + while (!down_write_trylock(eventfs_rwsem))
> >> + msleep(10);
> >
> > What's this loop for? Something like that needs a very good explanation
> > in a comment. Loops like these are usually a sign of a workaround for a
> > bug in the design, or worse, simply hides an existing bug.
> >
>
> Yes correct, this logic is to solve deadlock:
>
> Thread 1 Thread 2
> down_read_nested() - read lock acquired
> down_write() - waiting for write lock to acquire
> down_read_nested() - deadlock
>
> Deadlock is because rwlock wouldn’t allow read lock to be acquired if write lock is waiting.
> down_write_trylock() wouldn’t add the write lock in waiting queue, hence helps to prevent
> deadlock scenario.
>
> I was stuck with this Deadlock, tried few methods and finally borrowed from cifs, as it’s
> upstreamed, tested and working in cifs, please refer:
> https://elixir.bootlin.com/linux/v6.3.1/source/fs/cifs/file.c#L438
I just looked at that code and the commit, and I honestly believe that
is a horrible hack, and very fragile. It's in the smb code, so it was
unlikely reviewed by anyone outside that subsystem. I really do not
want to prolificate that solution around the kernel. We need to come up
with something else.
I also think it's buggy (yes the cifs code is buggy!) because in the
comment above the down_read_nested() it says:
/*
* nested locking. NOTE: rwsems are not allowed to recurse
* (which occurs if the same task tries to acquire the same
* lock instance multiple times), but multiple locks of the
* same lock class might be taken, if the order of the locks
* is always the same. This ordering rule can be expressed
* to lockdep via the _nested() APIs, but enumerating the
* subclasses that are used. (If the nesting relationship is
* static then another method for expressing nested locking is
* the explicit definition of lock class keys and the use of
* lockdep_set_class() at lock initialization time.
* See Documentation/locking/lockdep-design.rst for more details.)
*/
So this is NOT a solution (and the cifs code should be fixed too!)
Can you show me the exact backtrace where the reader lock gets taken
again? We will have to come up with a way to not take the same lock
twice.
We can also look to see if we can implement this with RCU. What exactly
is this rwsem protecting?
>
> Looking further for your input. I will add explanation in v4.
>
>
> >> +}
> >> +
[..]
> >> + *
> >> + * This function creates the top of the trace event directory.
> >> + */
> >> +struct dentry *eventfs_create_events_dir(const char *name,
> >> + struct dentry *parent,
> >> + struct rw_semaphore *eventfs_rwsem)
> >
> > OK, I'm going to have to really look at this. Passing in a lock to the
> > API is just broken. We need to find a way to solve this another way.
>
> eventfs_rwsem is a member of struct trace_array, I guess we should
> pass pointer to trace_array.
No, it should not be part of the trace_array. If we can't do this with
RCU, then we need to add a descriptor that contains the dentry that is
returned above, and have the lock held there. The caller of the
eventfs_create_events_dir() should not care about locking. That's an
implementation detail that should *not* be part of the API.
That is, if you need a lock:
struct eventfs_dentry {
struct dentry *dentry;
struct rwsem *rwsem;
};
And then get to that lock by using the container_of() macro. All
created eventfs dentry's could have this structure, where the rwsem
points to the top one. Again, that's only if we can't do this with RCU.
-- Steve
>
>
> > I'm about to board a plane to JFK shortly, I'm hoping to play with this
> > while flying back.
> >
>
> I have replied for major concerns. All other minor I will take care in v4.
>
> Thanks a lot for giving time to eventfs patches.
>
> - Ajay
>
Powered by blists - more mailing lists