lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZthVTC8dt1kSdjMb@hovoldconsulting.com>
Date: Wed, 4 Sep 2024 14:40:44 +0200
From: Johan Hovold <johan@...nel.org>
To: Chris Lew <quic_clew@...cinc.com>,
	Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
	Bjorn Andersson <andersson@...nel.org>
Cc: Stephan Gerhold <stephan.gerhold@...aro.org>,
	Konrad Dybcio <konradybcio@...nel.org>,
	linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
	Abel Vesa <abel.vesa@...aro.org>
Subject: Re: [PATCH 1/2] soc: qcom: pd_mapper: Add X1E80100

On Thu, Aug 22, 2024 at 09:28:26AM +0200, Johan Hovold wrote:
> On Tue, Jul 30, 2024 at 11:04:58PM -0700, Chris Lew wrote:
> 
> > GLINK has a concept that is called "intents". An intent is an object 
> > that signifies that a remote channel is ready to receive a packet 
> > through GLINK. Intents can be pre-emptively queued, or they can be 
> > requested by the sending entity. GLINK will not try to send or it will 
> > block until there is an intent available.
> > 
> > Intents are exchanged with GLINK_CMD_INTENT packets. When Linux receives 
> > one of these packets we add it to an idr "riids".
> > 
> > Example sending call:
> >      pmic_glink_send() --> rpmsg_send() --> qcom_glink_send() --> 
> > __qcom_glink_send() --> qcom_glink_request_intent()
> > 
> > In __qcom_glink_send(), we check if there are any available intents in 
> > "riids", if there aren't any intents we request an intent through 
> > qcom_glink_request_intent(). This sends a GLINK_CMD_RX_INTENT_REQ packet 
> > to the remote and waits for a GLINK_CMD_RX_INTENT_REQ_ACK packet in 
> > return. This ack packet will have a field that says whether the intent 
> > has been granted or not. When linux gets this ack packet, we will wake 
> > up the thread waiting in qcom_glink_request_intent().
> > 
> > The ECANCELED comes from qcom_glink_request_intent() when we receive a 
> > GLINK_CMD_RX_INTENT_REQ_ACK that has granted == false.
> > 
> > On the firmware, when a glink channel is registered they can optionally 
> > fill in a handler for GLINK_CMD_RX_INTENT_REQ packets. If this handler 
> > is not configured, then a default one will be used where all 
> > GLINK_CMD_RX_INTENT_REQ packets will be responded with 
> > GLINK_CMD_RX_INTENT_REQ_ACK and granted == false. If a channel is 
> > implemented this way, then the only thing Linux can do is wait and retry 
> > until the remote queues the intents on its own accord.
> > 
> > This would be my current guess as to what's happening based on this not 
> > being consistent and only seen every couple of reboots. A stop path 
> > problem sounds like it should happen every time, and we should also see 
> > the remoteproc prints related to powering down the adsp. The above race 
> > should be applicable to all platforms but depends on the speed of the 
> > ADSP vs the CPU.
> 
> Thanks for the above. This indeed seems to match what I'm seeing as I
> also reported here [1]:
> 
> [    9.539415]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
> [    9.561750] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> 
> [    9.448945]  30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0
> [    9.461267] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
> [    9.469241] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1
> [    9.478968] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
> 
> I assume we do not want to have every client driver implement a retry
> loop for the first communication with the remote end, so can this be
> handled by the pmic_glink driver somehow? For example, by not forwarding
> state changes until some generic request has gone through?

Has there been any progress on this issue? It's already been five weeks
since my initial report and we're closing in on the merge window for
6.12. If this isn't resolved soon, I'll send patch to disable the
in-kernel pd-mapper by marking it as broken.

> And what about the audio service errors:
> 
> 	[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
>	[   14.571943] PDR: service lookup for avs/audio failed: -110
> 
> Does this seem to be a separate (but related) issue or just a different
> symptom?

I can confirm that the audio breakage is also related to the in-kernel
pd-mapper. I hit it after 30 reboots with the in-kernel pd-mapper, but
have not seen it with the user space service (e.g. after 100+ reboots).

> [1] https://lore.kernel.org/lkml/ZsRGV4hplvidpYji@hovoldconsulting.com/

Johan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ