lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z6nKOz97Neb1zZOa@hovoldconsulting.com>
Date: Mon, 10 Feb 2025 10:43:23 +0100
From: Johan Hovold <johan@...nel.org>
To: Mukesh Ojha <mukesh.ojha@....qualcomm.com>
Cc: Bjorn Andersson <andersson@...nel.org>, konradybcio@...nel.org,
	linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
	stable@...r.kernel.org, Saranya R <quic_sarar@...cinc.com>,
	Frank Oltmanns <frank@...manns.dev>
Subject: Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock

On Mon, Feb 10, 2025 at 02:50:18PM +0530, Mukesh Ojha wrote:
> On Thu, Feb 06, 2025 at 04:13:25PM -0600, Bjorn Andersson wrote:
> 
> > On Wed, Jan 29, 2025 at 09:25:44PM +0530, Mukesh Ojha wrote:
> > > When some client process A call pdr_add_lookup() to add the look up for
> > > the service and does schedule locator work, later a process B got a new
> > > server packet indicating locator is up and call pdr_locator_new_server()
> > > which eventually sets pdr->locator_init_complete to true which process A
> > > sees and takes list lock and queries domain list but it will timeout due
> > > to deadlock as the response will queued to the same qmi->wq and it is
> > > ordered workqueue and process B is not able to complete new server
> > > request work due to deadlock on list lock.
> > > 
> > >        Process A                        Process B
> > > 
> > >                                      process_scheduled_works()
> > > pdr_add_lookup()                      qmi_data_ready_work()
> > >  process_scheduled_works()             pdr_locator_new_server()
> > >                                          pdr->locator_init_complete=true;
> > >    pdr_locator_work()
> > >     mutex_lock(&pdr->list_lock);
> > > 
> > >      pdr_locate_service()                  mutex_lock(&pdr->list_lock);
> > > 
> > >       pdr_get_domain_list()
> > >        pr_err("PDR: %s get domain list
> > >                txn wait failed: %d\n",
> > >                req->service_name,
> > >                ret);
> > > 
> > > Fix it by removing the unnecessary list iteration as the list iteration
> > > is already being done inside locator work, so avoid it here and just
> > > call schedule_work() here.
> > > 
> > 
> > I came to the same patch while looking into the issue related to
> > in-kernel pd-mapper reported here:
> > https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> > 
> > So:
> > Reviewed-by: Bjorn Andersson <bjorn.andersson@....qualcomm.com>
> > Tested-by: Bjorn Andersson <bjorn.andersson@....qualcomm.com>

I was gonna ask if you have confirmed that this indeed fixes the audio
regression with the in-kernel pd-mapper?

Is this how you discovered the issue as well, Mukesh and Saranya?

If so, please mention that in the commit message, but in any case also
include the corresponding error messages directly so that people running
into this can find the fix more easily. (I see the pr_err now, but it's
not as greppable).

A Link tag to my report would be good to have as well if this fixes the
audio regression.

Johan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ