[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx-tmyJA30GtdU_dO9tWFoK+rO5tm-On4tPR7oQotnMkqQ@mail.gmail.com>
Date: Mon, 4 Mar 2024 22:47:09 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Herve Codina <herve.codina@...tlin.com>
Cc: Rob Herring <robh@...nel.org>, Nuno Sá <noname.nuno@...il.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Frank Rowand <frowand.list@...il.com>, Lizhi Hou <lizhi.hou@....com>, Max Zhen <max.zhen@....com>,
Sonal Santan <sonal.santan@....com>, Stefano Stabellini <stefano.stabellini@...inx.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>, linux-kernel@...r.kernel.org,
devicetree@...r.kernel.org, Allan Nielsen <allan.nielsen@...rochip.com>,
Horatiu Vultur <horatiu.vultur@...rochip.com>,
Steen Hegelund <steen.hegelund@...rochip.com>, Luca Ceresoli <luca.ceresoli@...tlin.com>,
Nuno Sa <nuno.sa@...log.com>, Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
stable@...r.kernel.org
Subject: Re: [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove() with
the devlink removals
On Mon, Mar 4, 2024 at 8:49 AM Herve Codina <herve.codina@...tlin.com> wrote:
>
> Hi Rob,
>
> On Mon, 4 Mar 2024 09:22:02 -0600
> Rob Herring <robh@...nel.org> wrote:
>
> ...
>
> > > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct
> > > > overlay_changeset *ovcs)
> > > > {
> > > > int i;
> > > >
> > > > + /*
> > > > + * Wait for any ongoing device link removals before removing some of
> > > > + * nodes. Drop the global lock while waiting
> > > > + */
> > > > + mutex_unlock(&of_mutex);
> > > > + device_link_wait_removal();
> > > > + mutex_lock(&of_mutex);
> > >
> > > I'm still not convinced we need to drop the lock. What happens if someone else
> > > grabs the lock while we are in device_link_wait_removal()? Can we guarantee that
> > > we can't screw things badly?
> >
> > It is also just ugly because it's the callers of
> > free_overlay_changeset() that hold the lock and now we're releasing it
> > behind their back.
> >
> > As device_link_wait_removal() is called before we touch anything, can't
> > it be called before we take the lock? And do we need to call it if
> > applying the overlay fails?
Rob,
This[1] scenario Luca reported seems like a reason for the
device_link_wait_removal() to be where Herve put it. That example
seems reasonable.
[1] - https://lore.kernel.org/all/20231220181627.341e8789@booty/
> >
>
> Indeed, having device_link_wait_removal() is not needed when applying the
> overlay fails.
>
> I can call device_link_wait_removal() from the caller of_overlay_remove()
> but not before the lock is taken.
> We need to call it between __of_changeset_revert_notify() and
> free_overlay_changeset() and so, the lock is taken.
>
> This lead to the following sequence:
> --- 8< ---
> int of_overlay_remove(int *ovcs_id)
> {
> ...
> mutex_lock(&of_mutex);
> ...
>
> ret = __of_changeset_revert_notify(&ovcs->cset);
> ...
>
> ret_tmp = overlay_notify(ovcs, OF_OVERLAY_POST_REMOVE);
> ...
>
> mutex_unlock(&of_mutex);
> device_link_wait_removal();
> mutex_lock(&of_mutex);
>
> free_overlay_changeset(ovcs);
> ...
> mutex_unlock(&of_mutex);
> ...
> }
> --- 8< ---
>
> In this sequence, the question is:
> Do we need to release the mutex lock while device_link_wait_removal() is
> called ?
In general I hate these kinds of sequences that release a lock and
then grab it again quickly. It's not always a bug, but my personal
take on that is 90% of these introduce a bug.
Drop the unlock/lock and we'll deal a deadlock if we actually hit one.
I'm also fairly certain that device_link_wait_removal() can't trigger
something else that can cause an OF overlay change while we are in the
middle of one. And like Rob said, I'm not sure this unlock/lock is a
good solution for that anyway.
Please CC me on the next series. And I'm glad folks convinced you to
use flush_workqueue(). As I said in the older series, I think
drain_workqueue() will actually break device links.
-Saravana
-Saravana
Powered by blists - more mailing lists