lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1660749261-7602-1-git-send-email-quic_akhilpo@quicinc.com>
Date:   Wed, 17 Aug 2022 20:44:13 +0530
From:   Akhil P Oommen <quic_akhilpo@...cinc.com>
To:     freedreno <freedreno@...ts.freedesktop.org>,
        <dri-devel@...ts.freedesktop.org>, <linux-arm-msm@...r.kernel.org>,
        Rob Clark <robdclark@...il.com>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        "Dmitry Baryshkov" <dmitry.baryshkov@...aro.org>
CC:     Matthias Kaehlcke <mka@...omium.org>,
        Jonathan Marek <jonathan@...ek.ca>,
        Jordan Crouse <jordan@...micpenguin.net>,
        Douglas Anderson <dianders@...omium.org>,
        Akhil P Oommen <quic_akhilpo@...cinc.com>,
        "Abhinav Kumar" <quic_abhinavk@...cinc.com>,
        Chia-I Wu <olvaffe@...il.com>,
        "Dan Carpenter" <dan.carpenter@...cle.com>,
        Daniel Vetter <daniel@...ll.ch>,
        David Airlie <airlied@...ux.ie>,
        Philipp Zabel <p.zabel@...gutronix.de>,
        Sean Paul <sean@...rly.run>, Wang Qing <wangqing@...o.com>,
        <linux-kernel@...r.kernel.org>
Subject: [PATCH v4 0/7] Improve GPU Recovery


Recently, I debugged a few device crashes which occured during recovery
after a hangcheck timeout. It looks like there are a few things we can
do to improve our chance at a successful gpu recovery.

First one is to ensure that CX GDSC collapses which clears the internal
states in gpu's CX domain. First 5 patches tries to handle this.

Rest of the patches are to ensure that few internal blocks like CP, GMU
and GBIF are halted properly before proceeding for a snapshot followed by
recovery. Also, handle 'prepare slumber' hfi failure correctly. These
are A6x specific improvements.

This series is rebased on top of v2 version of [1] which is based on
linus's master branch.

[1] https://patchwork.freedesktop.org/series/106860/

Changes in v4:
- Keep active_submit lock across the suspend & resume (Rob)
- Clear gpu->active_submits to silence a WARN() during runpm suspend (Rob)

Changes in v3:
- Use reset interface from gpucc driver to poll for cx gdsc collapse
  https://patchwork.freedesktop.org/series/106860/
- Use single pm refcount for all active submits

Changes in v2:
- Rebased on msm-next tip

Akhil P Oommen (7):
  drm/msm: Remove unnecessary pm_runtime_get/put
  drm/msm: Take single rpm refcount on behalf of all submits
  drm/msm: Correct pm_runtime votes in recover worker
  drm/msm: Fix cx collapse issue during recovery
  drm/msm/a6xx: Ensure CX collapse during gpu recovery
  drm/msm/a6xx: Improve gpu recovery sequence
  drm/msm/a6xx: Handle GMU prepare-slumber hfi failure

 drivers/gpu/drm/msm/adreno/a6xx.xml.h |  4 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 83 ++++++++++++++++++++++-------------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 43 ++++++++++++++++--
 drivers/gpu/drm/msm/msm_gpu.c         | 21 ++++++---
 drivers/gpu/drm/msm/msm_gpu.h         |  4 ++
 drivers/gpu/drm/msm/msm_ringbuffer.c  |  4 --
 6 files changed, 114 insertions(+), 45 deletions(-)

-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ