[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b323d6e9ef873bfc770e9d54b7a3a64@manjaro.org>
Date: Tue, 10 Dec 2024 09:18:25 +0100
From: Dragan Simic <dsimic@...jaro.org>
To: Peter Geis <pgwipeout@...il.com>
Cc: Heiko Stuebner <heiko@...ech.de>, Caesar Wang <wxt@...k-chips.com>,
Detlev Casanova <detlev.casanova@...labora.com>, Finley Xiao
<finley.xiao@...k-chips.com>, Jonathan Cameron
<Jonathan.Cameron@...wei.com>, Kevin Hilman <khilman@...aro.org>, Krzysztof
Kozlowski <krzysztof.kozlowski@...aro.org>, Ulf Hansson
<ulf.hansson@...aro.org>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH 1/6] pmdomain: rockchip: fix rockchip_pd_power error
handling
Hello Peter,
On 2024-12-10 02:30, Peter Geis wrote:
> The calls rockchip_pd_power makes to rockchip_pmu_set_idle_request lack
> any return error handling, causing device drivers to incorrectly
> believe
> the hardware idle requests succeed when they may have failed. This
> leads
> to software possibly accessing hardware that is powered off and the
> subsequent SError panic that follows.
>
> Add error checking and return errors to the calling function to prevent
> such crashes.
>
> gst-launch-1.0 videotestsrc num-buffers=2000 ! v4l2jpegenc ! fakesink
> Setting pipeline to PAUSED ...er-x64
> Pipeline is PREROLLING ...
> Redistribute latency...
> rockchip-pm-domain ff100000.syscon:power-controller: failed to get ack
> on
> domain 'hevc', val=0x98260, idle = 0
> SError Interrupt on CPU2, code 0x00000000bf000002 -- SError
> CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
> #54
> Hardware name: Firefly roc-rk3328-cc (DT)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
> lr : device_run+0xb0/0x128
> sp : ffff800082143a20
> x29: ffff800082143a20 x28: 0000000000000140 x27: 0000000000000000
> x26: ffff582c47a313e8 x25: ffff582c53e95000 x24: ffff582c53e92800
> x23: ffff582c5bbe0000 x22: 0000000000000000 x21: ffff582c47a31080
> x20: ffffa0d78cfa4168 x19: ffffa0d78cfa4168 x18: ffffb755b0519000
> x17: 000000040044ffff x16: 00500072b5503510 x15: a7a6a5a4a3a29a99
> x14: 989796959493928a x13: 0000000051eb851f x12: 00000000000000ff
> x11: ffffa0d78d812880 x10: ffffa0d78d7fbca0 x9 : 000000000000003f
> x8 : 0000000000000063 x7 : 000000000000003f x6 : 0000000000000040
> x5 : ffff80008010d000 x4 : ffffa0d78cfa4168 x3 : ffffa0d78cfbfdd8
> x2 : ffff80008010d0f4 x1 : 0000000000000020 x0 : 0000000000000140
> Kernel panic - not syncing: Asynchronous SError Interrupt
> CPU: 2 UID: 0 PID: 804 Comm: videotestsrc0:s Not tainted 6.12.0-rc5+
> #54
> Hardware name: Firefly roc-rk3328-cc (DT)
> Call trace:
> dump_backtrace+0xa0/0x128
> show_stack+0x20/0x38
> dump_stack_lvl+0xc8/0xf8
> dump_stack+0x18/0x28
> panic+0x3ec/0x428
> nmi_panic+0x48/0xa0
> arm64_serror_panic+0x6c/0x88
> do_serror+0x30/0x70
> el1h_64_error_handler+0x38/0x60
> el1h_64_error+0x7c/0x80
> rockchip_vpu2_jpeg_enc_run+0x168/0xbc8
> device_run+0xb0/0x128
> v4l2_m2m_try_run+0xac/0x230
> v4l2_m2m_ioctl_streamon+0x70/0x90
> v4l_streamon+0x2c/0x40
> __video_do_ioctl+0x194/0x400
> video_usercopy+0x10c/0x808
> video_ioctl2+0x20/0x80
> v4l2_ioctl+0x48/0x70
> __arm64_sys_ioctl+0xb0/0x100
> invoke_syscall+0x50/0x120
> el0_svc_common.constprop.0+0x48/0xf0
> do_el0_svc+0x24/0x38
> el0_svc+0x38/0x100
> el0t_64_sync_handler+0xc0/0xc8
> el0t_64_sync+0x1a8/0x1b0
> SMP: stopping secondary CPUs
> Kernel Offset: 0x20d70c000000 from 0xffff800080000000
> PHYS_OFFSET: 0xffffa7d3c0000000
> CPU features: 0x00,00000090,00200000,0200421b
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
>
> Fixes: 7c696693a4f5 ("soc: rockchip: power-domain: Add power domain
> driver")
> Signed-off-by: Peter Geis <pgwipeout@...il.com>
This patch is obviously correct, because not checking what
rockchip_pmu_set_idle_request() returns was simply wrong.
Thanks for the patch!
Though, shouldn't we improve further the way proper error
handling is performed in rockchip_do_pmu_set_power_domain(),
by "rolling back" what rockchip_do_pmu_set_power_domain()
did after powering up fails?
> ---
>
> drivers/pmdomain/rockchip/pm-domains.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pmdomain/rockchip/pm-domains.c
> b/drivers/pmdomain/rockchip/pm-domains.c
> index cb0f93800138..57e8fa25d2bd 100644
> --- a/drivers/pmdomain/rockchip/pm-domains.c
> +++ b/drivers/pmdomain/rockchip/pm-domains.c
> @@ -590,14 +590,18 @@ static int rockchip_pd_power(struct
> rockchip_pm_domain *pd, bool power_on)
> rockchip_pmu_save_qos(pd);
>
> /* if powering down, idle request to NIU first */
> - rockchip_pmu_set_idle_request(pd, true);
> + ret = rockchip_pmu_set_idle_request(pd, true);
> + if (ret < 0)
> + return ret;
> }
>
> rockchip_do_pmu_set_power_domain(pd, power_on);
>
> if (power_on) {
> /* if powering up, leave idle mode */
> - rockchip_pmu_set_idle_request(pd, false);
> + ret = rockchip_pmu_set_idle_request(pd, false);
> + if (ret < 0)
> + return ret;
>
> rockchip_pmu_restore_qos(pd);
> }
Powered by blists - more mailing lists