lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <92478932-4fd4-43c3-9ee0-c4ce19026984@arm.com>
Date:   Wed, 8 Nov 2023 15:44:40 +0000
From:   Steven Price <steven.price@....com>
To:     AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        boris.brezillon@...labora.com
Cc:     robh@...nel.org, maarten.lankhorst@...ux.intel.com,
        mripard@...nel.org, tzimmermann@...e.de, airlied@...il.com,
        daniel@...ll.ch, dri-devel@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org, wenst@...omium.org,
        kernel@...labora.com
Subject: Re: [PATCH v2 1/6] drm/panfrost: Perform hard reset to recover GPU if
 soft reset fails

On 02/11/2023 14:26, AngeloGioacchino Del Regno wrote:
> Even though soft reset should ideally never fail, during development of
> some power management features I managed to get some bits wrong: this
> resulted in GPU soft reset failures, where the GPU was never able to
> recover, not even after suspend/resume cycles, meaning that the only
> way to get functionality back was to reboot the machine.
> 
> Perform a hard reset after a soft reset failure to be able to recover
> the GPU during runtime (so, without any machine reboot).
> 
> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_gpu.c  | 14 ++++++++++----
>  drivers/gpu/drm/panfrost/panfrost_regs.h |  1 +
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index fad75b6e543e..7e9e2cf26e4d 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -60,14 +60,20 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
>  
>  	gpu_write(pfdev, GPU_INT_MASK, 0);
>  	gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_RESET_COMPLETED);
> -	gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET);
>  
> +	gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET);
>  	ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT,
>  		val, val & GPU_IRQ_RESET_COMPLETED, 100, 10000);
> -

I'm not sure what's going on with blank lines above - AFAICT there's no
actual change just a blank line being moved. It's best to avoid blank
line changes to keep the diff readable.

>  	if (ret) {
> -		dev_err(pfdev->dev, "gpu soft reset timed out\n");
> -		return ret;
> +		dev_err(pfdev->dev, "gpu soft reset timed out, attempting hard reset\n");
> +
> +		gpu_write(pfdev, GPU_CMD, GPU_CMD_HARD_RESET);
> +		ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT,
> +			val, val & GPU_IRQ_RESET_COMPLETED, 100, 10000);

NIT: checkpatch complains about the alignment here.

Other than the minor comments this looks fine. Hard reset isn't
something we want to use (there's a possibility of locking up the system
if it occurs during a bus transaction) but it can sometimes recover an
otherwise completely locked up GPU.

Steve

> +		if (ret) {
> +			dev_err(pfdev->dev, "gpu hard reset timed out\n");
> +			return ret;
> +		}
>  	}
>  
>  	gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 55ec807550b3..c25743b05c55 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -44,6 +44,7 @@
>  	 GPU_IRQ_MULTIPLE_FAULT)
>  #define GPU_CMD				0x30
>  #define   GPU_CMD_SOFT_RESET		0x01
> +#define   GPU_CMD_HARD_RESET		0x02
>  #define   GPU_CMD_PERFCNT_CLEAR		0x03
>  #define   GPU_CMD_PERFCNT_SAMPLE	0x04
>  #define   GPU_CMD_CYCLE_COUNT_START	0x05

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ