[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17063984.geO5KgaWL5@diego>
Date: Wed, 19 Feb 2025 17:56:17 +0100
From: Heiko Stübner <heiko@...ech.de>
To: Quentin Schulz <quentin.schulz@...rry.de>
Cc: linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org,
linux-kernel@...r.kernel.org, lukasz.czechowski@...umatec.com,
Heiko Stuebner <heiko.stuebner@...rry.de>
Subject:
Re: [PATCH 1/2] arm64: dts: rockchip: remove supports-cqe from rk3588 jaguar
Am Mittwoch, 19. Februar 2025, 17:06:52 MEZ schrieb Quentin Schulz:
> Hi Heiko,
>
> On 2/19/25 10:33 AM, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@...rry.de>
> >
> > The sdhci controller supports cqe it seems and necessary code also is in
> > place - in theory.
> >
> > At this point Jaguar and Tiger are the only boards enabling cqe support
> > on the rk3588 and we are seeing reliability issues under load.
> >
> > This can be caused by either a controller-, hw- or driver-issue and
> > definitly needs more investigation to work properly it seems.
> >
> > So disable cqe support on Jaguar for now.
> >
>
> Seems more reasonable to me for the time being.
>
> Aside from the reliability issues, I could also trigger a stack trace with:
>
> $ mmc rpmb read-counter /dev/mmcblk0rpmb
> [ 1119.647435] mmc0: Timeout waiting for hardware interrupt.
> [ 1119.653480] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [ 1119.660676] mmc0: sdhci: Sys addr: 0x00000001 | Version: 0x00000005
> [ 1119.667871] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
> [ 1119.675066] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x0000002b
> [ 1119.682261] mmc0: sdhci: Present: 0x03f701f6 | Host ctl: 0x00000035
> [ 1119.689455] mmc0: sdhci: Power: 0x00000001 | Blk gap: 0x00000000
> [ 1119.696649] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000407
> [ 1119.703845] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00000000
> [ 1119.711039] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
> [ 1119.718235] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
> [ 1119.725429] mmc0: sdhci: Caps: 0x226dc881 | Caps_1: 0x08000007
> [ 1119.732624] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
> [ 1119.739819] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x00000000
> [ 1119.747014] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x000007d9
> [ 1119.754209] mmc0: sdhci: Host ctl2: 0x0000000f
> [ 1119.759169] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0057b200
> [ 1119.766363] mmc0: sdhci: ============================================
> [ 1119.773595] sdhci-dwcmshc fe2e0000.mmc: __mmc_blk_ioctl_cmd: data
> error -110
I can reproduce this timeout with CQE enabled.
After disabling CQE, this goes away to the regularly expected response.
> FWIW, the changes that Rockchip seems to have done on top of that driver
> in their 6.1 vendor fork are the following commits:
>
> https://git.theobroma-systems.com/jaguar-linux.git/commit/drivers/mmc/host/sdhci-of-dwcmshc.c?id=2ef0767967138d333360ec0f399f1d68646741c3&h=linux-6.1-stan-rkr3.2-jaguar
> https://git.theobroma-systems.com/jaguar-linux.git/commit/drivers/mmc/host/sdhci-of-dwcmshc.c?id=75dfde714bbe81e938190142d07307fa864fda34&h=linux-6.1-stan-rkr3.2-jaguar
>
> Maybe something worth having a look at some time in the future.
>
> Reviewed-by: Quentin Schulz <quentin.schulz@...rry.de>
>
> Thanks!
> Quentin
>
Powered by blists - more mailing lists