lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb4a8978-790a-46c5-94bd-9f97ffa15b64@mainlining.org>
Date: Tue, 4 Nov 2025 02:30:59 +0100
From: Jens Reidel <adrian@...nlining.org>
To: Neil Armstrong <neil.armstrong@...aro.org>,
 Rob Clark <robin.clark@....qualcomm.com>, Sean Paul <sean@...rly.run>,
 Konrad Dybcio <konradybcio@...nel.org>, Dmitry Baryshkov <lumag@...nel.org>,
 Abhinav Kumar <abhinav.kumar@...ux.dev>,
 Jessica Zhang <jessica.zhang@....qualcomm.com>,
 Marijn Suijten <marijn.suijten@...ainline.org>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>
Cc: linux-arm-msm@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 freedreno@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC RFT] drm/msm: adreno: attach the GMU device to a
 driver

Hi Neil,

On 10/29/25 11:25 AM, Neil Armstrong wrote:
> Hi,
> 
> On 10/26/25 02:31, Jens Reidel wrote:
>> On 10/22/25 14:44, Neil Armstrong wrote:
>>> Due to the sync_state is enabled by default in pmdomain & CCF since 
>>> v6.17,
>>> the GCC and GPUCC sync_state would stay pending, leaving the 
>>> resources in
>>> full performance:
>>> gcc-x1e80100 100000.clock-controller: sync_state() pending due to 
>>> 3d6a000.gmu
>>> gpucc-x1e80100 3d90000.clock-controller: sync_state() pending due to 
>>> 3d6a000.gmu
>>>
>>> In order to fix this state and allow the GMU to be properly
>>> probed, let's add a proper driver for the GMU and add it to
>>> the MSM driver components.
>>>
>>> Only the proper GMU has been tested since I don't have
>>> access to hardware with a GMU wrapper.
>>>
>>> Signed-off-by: Neil Armstrong <neil.armstrong@...aro.org>
>>> ---
>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c      | 354 +++++++++++++ 
>>> +---------------
>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c      |   6 -
>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h      |   3 -
>>>   drivers/gpu/drm/msm/adreno/adreno_device.c |   4 +
>>>   drivers/gpu/drm/msm/adreno/adreno_gpu.h    |   4 +
>>>   drivers/gpu/drm/msm/msm_drv.c              |  16 +-
>>>   6 files changed, 192 insertions(+), 195 deletions(-)
>>>
> 
> <snip>
> 
>>>
>>> ---
>>> base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada
>>> change-id: 20251022-topic-adreno-attach-gmu-to-driver-e47025fd7ebb
>>>
>>> Best regards,
>>
>> Hi Neil,
>>
>> thanks for the patch. With it applied, my GPU fails to initialize.
>> Here's the related dmesg section:
>>
>> [    1.733062] [drm:dpu_kms_hw_init:1173] dpu hardware 
>> revision:0x50020000
>> [    1.735229] [drm] Initialized msm 1.13.0 for ae01000.display- 
>> controller on minor 0
>> [    1.735403] msm_dpu ae01000.display-controller: 
>> [drm:adreno_request_fw] loaded qcom/a630_sqe.fw from new location
>> [    1.735513] msm_dpu ae01000.display-controller: 
>> [drm:adreno_request_fw] loaded qcom/a630_gmu.bin from new location
>> [    1.746710] a6xx_gmu 506a000.gmu: [drm:a6xx_gmu_set_oob] *ERROR* 
>> Timeout waiting for GMU OOB set BOOT_SLUMBER: 0x800000
>> [    1.746766] msm_dpu ae01000.display-controller: 
>> [drm:adreno_load_gpu] *ERROR* Couldn't power up the GPU: -110
>>
>> This could be because I have an Adreno 630-family GPU, which is marked 
>> as legacy in a6xx_gmu_init / a6xx_gmu_bind. Previously, the rest of 
>> the init code would just always run, while now, some parts are 
>> conditionally disabled for legacy GPUs - that may be unintentional? 
>> However, unconditionally enabling those parts seems to fail to 
>> initialize the GPU followed by a reset shortly after, so there's 
>> probably more to this.
>>
>> Please let me know if there's anything I can do to help debug this.
> 
> Thanks for the report, it's an sdm845 based right ?

Almost, it's SM7150 with Adreno 618.

> 
> I may have mismatched the role of the legacy parameter...
> 
> Could you try this on top:

<snip>

> ===========================><=====================================

This is about what I had already tried earlier. I wasn't able to grab a 
log from
UART to see what exactly was still wrong back then, but I finally got 
around to it today.

Short excerpt from decoded stacktrace:

[    4.838573] Unable to handle kernel paging request at virtual address 
0000000000023010
[    4.846726] Mem abort info:
[    4.857916]   ESR = 0x0000000096000044
[    4.870865]   EC = 0x25: DABT (current EL), IL = 32 bits
[    4.883897]   SET = 0, FnV = 0
[    4.895344]   EA = 0, S1PTW = 0
[    4.898584]   FSC = 0x04: level 0 translation fault
[    4.898586] Data abort info:
[    4.898587]   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[    4.898589]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[    4.898591]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    4.898593] [0000000000023010] user address but active_mm is swapper
[    4.898597] Internal error: Oops: 0000000096000044 [#1]  SMP
[    4.898600] Modules linked in:
[    4.898612] Tainted: [W]=WARN
[    4.898613] Hardware name: xiaomi Xiaomi POCO X3 NFC (Huaxing)/Xiaomi 
POCO X3 NFC (Huaxing), BIOS 2025.10-gcb980be18336 10/01/2025
[    4.898616] Workqueue: events_unbound deferred_probe_work_func
[    4.911316]
[    4.911318] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[    4.911321] pc : a6xx_gmu_rpmh_init (arch/arm64/include/asm/io.h:43 
include/asm-generic/io.h:293 drivers/gpu/drm/msm/adreno/a6xx_gmu.h:183 
drivers/gpu/drm/msm/adreno/a6xx_gmu.c:621)
[    4.911327] lr : a6xx_gmu_rpmh_init 
(drivers/gpu/drm/msm/adreno/a6xx_gmu.c:1811)
[    4.911331] sp : ffff8000809f3560
[    4.911332] x29: ffff8000809f3560 x28: 0000000000000001
[    4.919938]  x27: ffff800081e50000
[    4.919940] x26: 0000000000000300 x25: 0068000000000413 x24: 
ffffc51d5cca9000
[    4.919944] x23: 0000000000030090 x22: ffff000080aec3b0 x21: 
ffff00008162c010
[    4.919947] x20: ffff000080aec578 x19: ffff800081f90000 x18: 
000000000aa663d1
[    4.919950] x17: ffffc51d5cefc000 x16: ffffc51d5cca9d80 x15: 
0078000000000f13
[    4.930595]
[    4.930596] x14: 0000000000000000 x13: ffff800081f9ffff x12: 
ffff800081f9ffff
[    4.930600] x11: 0000000001000000 x10: 0000000000023010 x9 : 
0000000000000000
[    4.930603] x8 : 0000000000000000 x7 : ffff00008155a960 x6 : 
0000000000000000
[    4.930606] x5 : 0000000000000cc0 x4 : 0000000000001000 x3 : 
007800000b49ff13
[    4.930610] x2 : 000000000b4a0000
[    4.942943]  x1 : ffff800081fa0000 x0 : ffff800081e50000
[    4.942947] Call trace:
[    4.942948]  a6xx_gmu_rpmh_init (arch/arm64/include/asm/io.h:43 
include/asm-generic/io.h:293 drivers/gpu/drm/msm/adreno/a6xx_gmu.h:183 
drivers/gpu/drm/msm/adreno/a6xx_gmu.c:621) (P)
[    4.942954]  a6xx_gmu_bind (drivers/gpu/drm/msm/adreno/a6xx_gmu.c:2102)
[    4.942957]  component_bind_all (drivers/base/component.c:660)
[    4.956709]  msm_drm_init (drivers/gpu/drm/msm/msm_drv.c:159)
[    4.956714]  msm_drm_bind (drivers/gpu/drm/msm/msm_drv.c:1032)

Turns out that previously, gmu->mmio was assigned before setting
gmu->rscc = gmu->mmio + 0x23000;
With your changes, the order is now wrong.
Moving the assignment up again (and applying the diff you shared
for proper handling of legacy parameter) fixes it:

==========================================
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -2027,6 +2027,13 @@ static int a6xx_gmu_bind(struct device *dev, 
struct device *master, void *data)
                 if (ret)
                         goto err_memory;

+               /* Map the GMU registers */
+               gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
+               if (IS_ERR(gmu->mmio)) {
+                       ret = PTR_ERR(gmu->mmio);
+                       goto err_memory;
+               }
+
                 if (adreno_is_a650_family(adreno_gpu) ||
                     adreno_is_a7xx(adreno_gpu)) {
                         gmu->rscc = a6xx_gmu_get_mmio(pdev, "rscc");
@@ -2048,13 +2055,6 @@ static int a6xx_gmu_bind(struct device *dev, 
struct device *master, void *data)
                 }
         }

-       /* Map the GMU registers */
-       gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
-       if (IS_ERR(gmu->mmio)) {
-               ret = PTR_ERR(gmu->mmio);
-               goto err_memory;
-       }
-
         gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
         if (IS_ERR(gmu->cxpd)) {
                 ret = PTR_ERR(gmu->cxpd);
==========================================

This almost certainly isn't correct either because the wrapper needs
its registers mapped too, perhaps this is better suited for moving it
above the if block, I think that makes more sense.

With the legacy parameter changes and GMU register mapping prior to RSCC
offset calculation:Tested-by: Jens Reidel <adrian@...nlining.org> # SM7150

Best regards,Jens
> 
> Thanks,
> Neil
> 
>>
>> Best regards,
>> Jens
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ