[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx9Oo3F8oAOOS9e9RTCdWHvigx5On0phXrVfJqap2VcN2g@mail.gmail.com>
Date: Wed, 6 Mar 2024 13:14:55 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Herve Codina <herve.codina@...tlin.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Rob Herring <robh+dt@...nel.org>,
Frank Rowand <frowand.list@...il.com>, Lizhi Hou <lizhi.hou@....com>, Max Zhen <max.zhen@....com>,
Sonal Santan <sonal.santan@....com>, Stefano Stabellini <stefano.stabellini@...inx.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>, linux-kernel@...r.kernel.org,
devicetree@...r.kernel.org, Allan Nielsen <allan.nielsen@...rochip.com>,
Horatiu Vultur <horatiu.vultur@...rochip.com>,
Steen Hegelund <steen.hegelund@...rochip.com>, Luca Ceresoli <luca.ceresoli@...tlin.com>,
Nuno Sa <nuno.sa@...log.com>, Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
stable@...r.kernel.org
Subject: Re: [PATCH v4 1/2] driver core: Introduce device_link_wait_removal()
On Wed, Mar 6, 2024 at 7:56 AM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Wed, Mar 6, 2024 at 4:24 PM Herve Codina <herve.codina@...tlincom> wrote:
> >
> > Hi Rafael,
> >
> > On Wed, 6 Mar 2024 13:48:37 +0100
> > "Rafael J. Wysocki" <rafael@...nel.org> wrote:
> >
> > > On Wed, Mar 6, 2024 at 9:51 AM Herve Codina <herve.codina@...tlin.com> wrote:
> > > >
> > > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > > introduces a workqueue to release the consumer and supplier devices used
> > > > in the devlink.
> > > > In the job queued, devices are release and in turn, when all the
> > > > references to these devices are dropped, the release function of the
> > > > device itself is called.
> > > >
> > > > Nothing is present to provide some synchronisation with this workqueue
> > > > in order to ensure that all ongoing releasing operations are done and
> > > > so, some other operations can be started safely.
> > > >
> > > > For instance, in the following sequence:
> > > > 1) of_platform_depopulate()
> > > > 2) of_overlay_remove()
> > > >
> > > > During the step 1, devices are released and related devlinks are removed
> > > > (jobs pushed in the workqueue).
> > > > During the step 2, OF nodes are destroyed but, without any
> > > > synchronisation with devlink removal jobs, of_overlay_remove() can raise
> > > > warnings related to missing of_node_put():
> > > > ERROR: memory leak, expected refcount 1 instead of 2
> > > >
> > > > Indeed, the missing of_node_put() call is going to be done, too late,
> > > > from the workqueue job execution.
> > > >
> > > > Introduce device_link_wait_removal() to offer a way to synchronize
> > > > operations waiting for the end of devlink removals (i.e. end of
> > > > workqueue jobs).
> > > > Also, as a flushing operation is done on the workqueue, the workqueue
> > > > used is moved from a system-wide workqueue to a local one.
> > > >
> > > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > >
> > > No, it is not fixed by this patch.
> >
> > Was explicitly asked by Saravana on v1 review:
> > https://lore.kernel.org/linux-kernel/CAGETcx9uP86EHyKJNifBMd23oCsA+KpMa+e36wJEEnHDve+Avg@mail.gmail.com/
>
> Well, I don't think this is a valid request, sorry.
>
> > The commit 80dd33cf72d1 introduces the workqueue and so some asynchronous tasks
> > on removal.
> > This patch and the next one allows to re-sync execution waiting for jobs in
> > the workqueue when it is needed.
>
> I get this, but still, this particular individual patch by itself
> doesn't fix anything. Do you agree with this?
>
> If somebody applies this patch without the next one in the series,
> they will not get any change in behavior, so the tag is at least
> misleading.
>
> You can claim that the next patch on top of this one fixes things, so
> adding a Fixes tag to the other patch would be fine.
>
> There is an explicit dependency between them (the second patch is not
> even applicable without the first one, or if it is, the resulting code
> won't compile anyway), but you can make a note to the maintainer that
> they need to go to -stable together.
>
> > >
> > > In fact, the only possibly observable effect of this patch is the
> > > failure when the allocation of device_link_wq fails AFAICS.
> > >
> > > > Cc: stable@...r.kernel.org
> > >
> > > So why?
> >
> > Cc:stable is needed as this patch is a prerequisite of patch 2 (needed
> > to fix the asynchronous workqueue task issue).
>
> Dependencies like this can be expressed in "Cc: stable" tags.
> Personally, I'd do it like this:
>
> Cc: stable@...r.kernel.org # 5.13: Depends on the first patch in the series
I'm okay with this too. I personally think it's better to list "Fixes:
xyz" in all the patches that are needed to fix xyz (especially when
there's no compile time dependency on earlier patches), but it's not a
hill I'll die on. And if Rafael's suggestion is the expected norm,
then I'll remember to follow that in the future.
-Saravana
Powered by blists - more mailing lists