linux-kernel - Re: [PATCH] clk: qcom: Park shared RCGs upon registration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX18tA@mail.gmail.com>
Date: Mon, 5 Aug 2024 16:13:14 +0530
From: Amit Pundir <amit.pundir@...aro.org>
To: Stephen Boyd <swboyd@...omium.org>
Cc: Bjorn Andersson <andersson@...nel.org>, Michael Turquette <mturquette@...libre.com>, 
	Stephen Boyd <sboyd@...nel.org>, linux-kernel@...r.kernel.org, linux-clk@...r.kernel.org, 
	patches@...ts.linux.dev, linux-arm-msm@...r.kernel.org, 
	Laura Nao <laura.nao@...labora.com>, Dmitry Baryshkov <dmitry.baryshkov@...aro.org>, 
	Douglas Anderson <dianders@...omium.org>, Taniya Das <quic_tdas@...cinc.com>
Subject: Re: [PATCH] clk: qcom: Park shared RCGs upon registration

On Sat, 3 Aug 2024 at 06:29, Stephen Boyd <swboyd@...omium.org> wrote:
>
> Quoting Amit Pundir (2024-08-01 04:59:28)
> > Hi Stephen,
> >
> > This patch caused a few deferred probes on sm8550-hdk breaking the
> > audio codec and usb-c host mode support. This breakage is not 100%
> > reproducible but can be fairly easily reproduced though.
> > I have attached the relevant logs and defconfig here
> > https://bugs.linaro.org/show_bug.cgi?id=6053 for reference. Let me
> > know if you need more information or if I can assist you in testing a
> > debug patch to diagnose it further.
> >
>
> Thanks for the report! I'm not sure why probe would defer because of
> this patch though. Maybe there's a slowness to probe that isn't there
> when we don't park all the RCGs using the shared clk ops. Can you try
> this patch and see if it makes things better? I'd like to narrow it down
> to the clk that's the problem instead of changing every sm8550 clk
> that's using the shared clk ops. To do that, undo the change for some
> set of RCGs until the problem comes back (assuming the patch fixes it at
> all).
>
> What the patch does is calculate the cached cfg register value to fix
> one problem, but skips parking the clk at registration time because I
> suspect that's causing deferred probes. Of course, deferred probe in
> itself shouldn't be a problem, so if simply having drivers defer probe
> causes an issue then the problem isn't in the clk driver.
>
> Also please send back the dmesg so we can see what clks are configured
> for at boot time. If they're using TCXO source at boot then they're not
> going to be broken. In which case those clks can keep using the old clk
> ops and we can focus on the ones that aren't sourcing from TCXO.

Thank your for this debug patch. I thought I narrowed down the
breakage to the clks in drivers/clk/qcom/gcc-sm8550.c, until I ran
into the following kernel panic in ucsi_glink driver in later test
runs.

[    7.882923][    T1] init: Loading module /lib/modules/ucsi_glink.ko
with args ''
[    7.892929][   T92] Unable to handle kernel NULL pointer
dereference at virtual address 0000000000000010
[    7.894935][    T1] init: Loaded kernel module /lib/modules/ucsi_glink.ko
[    7.902670][   T92] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000886218000
[    7.902674][   T92] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
[    7.993995][   T64] qcom_pmic_glink pmic-glink: Failed to create
device link (0x180) with a600000.usb
[    8.078673][   T92] CPU: 7 UID: 0 PID: 92 Comm: kworker/7:2
Tainted: G S          E      6.11.0-rc2-mainline-00001-g4153d980358d
#6
[    8.078676][   T92] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
[    8.078677][   T92] Hardware name: Qualcomm Technologies, Inc.
SM8550 HDK (DT)
[    8.078679][   T92] Workqueue: events pmic_glink_ucsi_register [ucsi_glink]
[    8.078682][   T92] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
[    8.078684][   T92] pc : pmic_glink_send+0x10/0x2c [pmic_glink]
[    8.078685][   T92] lr : pmic_glink_ucsi_read+0x84/0x14c [ucsi_glink]
[    8.078704][   T92] Call trace:
[    8.078705][   T92]  pmic_glink_send+0x10/0x2c [pmic_glink]
[    8.078706][   T92]  pmic_glink_ucsi_read+0x84/0x14c [ucsi_glink]
[    8.078707][   T92]  pmic_glink_ucsi_read_version+0x20/0x30 [ucsi_glink]
[    8.078708][   T92]  ucsi_register+0x28/0x70
[    8.078717][   T92]  pmic_glink_ucsi_register+0x18/0x28 [ucsi_glink]
[    8.078718][   T92]  process_one_work+0x184/0x2e8
[    8.078723][   T92]  worker_thread+0x2f0/0x404
[    8.078725][   T92]  kthread+0x114/0x118
[    8.078728][   T92]  ret_from_fork+0x10/0x20
[    8.078732][   T92] ---[ end trace 0000000000000000 ]---
[    8.078734][   T92] Kernel panic - not syncing: Oops: Fatal exception
[    8.078735][   T92] SMP: stopping secondary CPUs
[    8.279136][   T92] Kernel Offset: 0x14d9480000 from 0xffffffc080000000
[    8.279141][   T92] PHYS_OFFSET: 0x80000000
[    8.279143][   T92] CPU features: 0x18,004e0003,80113128,564676af
[    8.279148][   T92] Memory Limit: none

I couldn't reproduce this kernel panic on vanilla v6.11-rc2 in 50+
test runs after that. So I'm assuming that this debug patch may have
triggered it.
Attaching the crashing and working dmesg logs with the debug patch applied.

Regards,
Amit Pundir

View attachment "dmesg_glink_panic.txt" of type "text/plain" (99335 bytes)

View attachment "dmesg_working.txt" of type "text/plain" (107028 bytes)