[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z6nKOz97Neb1zZOa@hovoldconsulting.com>
Date: Mon, 10 Feb 2025 10:43:23 +0100
From: Johan Hovold <johan@...nel.org>
To: Mukesh Ojha <mukesh.ojha@....qualcomm.com>
Cc: Bjorn Andersson <andersson@...nel.org>, konradybcio@...nel.org,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org, Saranya R <quic_sarar@...cinc.com>,
Frank Oltmanns <frank@...manns.dev>
Subject: Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
On Mon, Feb 10, 2025 at 02:50:18PM +0530, Mukesh Ojha wrote:
> On Thu, Feb 06, 2025 at 04:13:25PM -0600, Bjorn Andersson wrote:
>
> > On Wed, Jan 29, 2025 at 09:25:44PM +0530, Mukesh Ojha wrote:
> > > When some client process A call pdr_add_lookup() to add the look up for
> > > the service and does schedule locator work, later a process B got a new
> > > server packet indicating locator is up and call pdr_locator_new_server()
> > > which eventually sets pdr->locator_init_complete to true which process A
> > > sees and takes list lock and queries domain list but it will timeout due
> > > to deadlock as the response will queued to the same qmi->wq and it is
> > > ordered workqueue and process B is not able to complete new server
> > > request work due to deadlock on list lock.
> > >
> > > Process A Process B
> > >
> > > process_scheduled_works()
> > > pdr_add_lookup() qmi_data_ready_work()
> > > process_scheduled_works() pdr_locator_new_server()
> > > pdr->locator_init_complete=true;
> > > pdr_locator_work()
> > > mutex_lock(&pdr->list_lock);
> > >
> > > pdr_locate_service() mutex_lock(&pdr->list_lock);
> > >
> > > pdr_get_domain_list()
> > > pr_err("PDR: %s get domain list
> > > txn wait failed: %d\n",
> > > req->service_name,
> > > ret);
> > >
> > > Fix it by removing the unnecessary list iteration as the list iteration
> > > is already being done inside locator work, so avoid it here and just
> > > call schedule_work() here.
> > >
> >
> > I came to the same patch while looking into the issue related to
> > in-kernel pd-mapper reported here:
> > https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> >
> > So:
> > Reviewed-by: Bjorn Andersson <bjorn.andersson@....qualcomm.com>
> > Tested-by: Bjorn Andersson <bjorn.andersson@....qualcomm.com>
I was gonna ask if you have confirmed that this indeed fixes the audio
regression with the in-kernel pd-mapper?
Is this how you discovered the issue as well, Mukesh and Saranya?
If so, please mention that in the commit message, but in any case also
include the corresponding error messages directly so that people running
into this can find the fix more easily. (I see the pr_err now, but it's
not as greppable).
A Link tag to my report would be good to have as well if this fixes the
audio regression.
Johan
Powered by blists - more mailing lists