lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201101144244.10086-1-rwright@hpe.com>
Date:   Sun,  1 Nov 2020 07:42:41 -0700
From:   rwright@....com
To:     jani.nikula@...ux.intel.com, joonas.lahtinen@...ux.intel.com,
        rodrigo.vivi@...el.com, airlied@...ux.ie, daniel@...ll.ch,
        sumit.semwal@...aro.org, christian.koenig@....com,
        hdegoede@...hat.com, wambui.karugax@...il.com,
        chris@...is-wilson.co.uk, matthew.auld@...el.com,
        akeem.g.abodunrin@...el.com, prathap.kumar.valsan@...el.com,
        mika.kuoppala@...ux.intel.com, rwright@....com
Cc:     intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org, linux-media@...r.kernel.org
Subject: [PATCH v3 0/3] Reduce context clear batch size to avoid gpu hang

From: Randy Wright <rwright@....com>

For several months, I've been experiencing GPU hangs when  starting
Cinnamon on an HP Pavilion Mini 300-020 if I try to run an upstream
kernel.  I reported this recently in
https://gitlab.freedesktop.org/drm/intel/-/issues/2413 where I have
attached the requested evidence including the state collected from
/sys/class/drm/card0/error and debug output from dmesg.

I ran a bisect to find the problem, which indicates this is the
troublesome commit:

  [47f8253d2b8947d79fd3196bf96c1959c0f25f20] drm/i915/gen7: Clear all EU/L3 residual contexts

The nature of that commit suggested to me that reducing the
batch size used in the context clear operation might help this
relatively low-powered system to avoid the hang.... and it did!
I simply forced this system to take the smaller batch length that is
already used for non-Haswell systems.

The first two versions of this patch were posted as RFC
patches to the Intel-gfx list, implementing the same
algorithmic change in function batch_get_defaults,
but without employing a properly constructed quirk.

I've now cleaned up the patch to employ a new QUIRK_RENDERCLEAR_REDUCED.
The quirk is presently set only for the aforementioned HP Pavilion Mini
300-020.  The patch now touches three files to define the quirk, set it,
and then check for it in function batch_get_defaults.

Randy Wright (3):
  drm/i915: Introduce quirk QUIRK_RENDERCLEAR_REDUCED
  drm/i915/display: Add function quirk_renderclear_reduced
  drm/i915/gt: Force reduced batch size if new QUIRK_RENDERCLEAR_REDUCED
    is set.

 drivers/gpu/drm/i915/display/intel_quirks.c | 13 +++++++++++++
 drivers/gpu/drm/i915/gt/gen7_renderclear.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h             |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ