linux-kernel - Re: [PATCH v2] drm/msm: Check for powered down HW in the devfreq callbacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CADaigPXJJoEgWK6nx8yc_DVsDAv1VdzuA5NYZO63K=hHVvT2JQ@mail.gmail.com>
Date:   Fri, 1 May 2020 12:26:40 -0700
From:   Eric Anholt <eric@...olt.net>
To:     Jordan Crouse <jcrouse@...eaurora.org>
Cc:     linux-arm-msm@...r.kernel.org, stable@...r.kernel.org,
        Akhil P Oommen <akhilpo@...eaurora.org>,
        AngeloGioacchino Del Regno <kholk11@...il.com>,
        Ben Dooks <ben.dooks@...ethink.co.uk>,
        Daniel Vetter <daniel@...ll.ch>,
        David Airlie <airlied@...ux.ie>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jeffrey Hugo <jeffrey.l.hugo@...il.com>,
        "Michael J. Ruhl" <michael.j.ruhl@...el.com>,
        Rob Clark <robdclark@...il.com>, Sean Paul <sean@...rly.run>,
        Sharat Masetty <smasetty@...eaurora.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        DRI Development <dri-devel@...ts.freedesktop.org>,
        freedreno@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] drm/msm: Check for powered down HW in the devfreq callbacks

On Fri, May 1, 2020 at 12:03 PM Jordan Crouse <jcrouse@...eaurora.org> wrote:
>
> Writing to the devfreq sysfs nodes while the GPU is powered down can
> result in a system crash (on a5xx) or a nasty GMU error (on a6xx):
>
>  $ /sys/class/devfreq/5000000.gpu# echo 500000000 > min_freq
>   [  104.841625] platform 506a000.gmu: [drm:a6xx_gmu_set_oob]
>         *ERROR* Timeout waiting for GMU OOB set GPU_DCVS: 0x0
>
> Despite the fact that we carefully try to suspend the devfreq device when
> the hardware is powered down there are lots of holes in the governors that
> don't check for the suspend state and blindly call into the devfreq
> callbacks that end up triggering hardware reads in the GPU driver.
>
> Call pm_runtime_get_if_in_use() in the gpu_busy() and gpu_set_freq()
> callbacks to skip the hardware access if it isn't active.
>
> v2: Use pm_runtime_get_if_in_use() per Eric Anholt
>
> Cc: stable@...r.kernel.org
> Signed-off-by: Jordan Crouse <jcrouse@...eaurora.org>
> ---
>
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 6 ++++++
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 8 ++++++++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 7 +++++++
>  3 files changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 724024a2243a..4d7f269edfcc 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1404,6 +1404,10 @@ static unsigned long a5xx_gpu_busy(struct msm_gpu *gpu)
>  {
>         u64 busy_cycles, busy_time;
>
> +       /* Only read the gpu busy if the hardware is already active */
> +       if (pm_runtime_get_if_in_use(&gpu->pdev->dev) <= 0)
> +               return 0;
> +

RPM's APIs are a bit of a trap and will return a negative errno for
the get functions if runtime PM is disabled in kconfig, even though
usually that would mean that the power domain is not ever disabled by
RPM.  I think in these checks you want "if (pm_runtime_get_if_in_use()
== 0)", and that seems to be a common pattern in other drivers.  With
that,

Reviewed-by: Eric Anholt <eric@...olt.net>

(and tested, too)