lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220527135022.0dd0891d@maurocar-mobl2>
Date:   Fri, 27 May 2022 13:50:22 +0200
From:   Mauro Carvalho Chehab <mauro.chehab@...ux.intel.com>
To:     Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
Cc:     Mauro Carvalho Chehab <mchehab@...nel.org>,
        Andi Shyti <andi.shyti@...ux.intel.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Daniele Ceraolo Spurio <daniele.ceraolospurio@...el.com>,
        David Airlie <airlied@...ux.ie>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        John Harrison <John.C.Harrison@...el.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Lucas De Marchi <lucas.demarchi@...el.com>,
        Matt Roper <matthew.d.roper@...el.com>,
        Matthew Auld <matthew.auld@...el.com>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        dri-devel@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        Sushma Venkatesh Reddy <sushma.venkatesh.reddy@...el.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Dave Airlie <airlied@...hat.com>,
        Jon Bloomfield <jon.bloomfield@...el.com>,
        Jani Nikula <jani.nikula@...el.com>, stable@...r.kernel.org
Subject: Re: [PATCH] drm/i915: don't flush TLB on GEN8

On Fri, 27 May 2022 11:55:42 +0100
Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com> wrote:

> On 27/05/2022 10:09, Mauro Carvalho Chehab wrote:
> > i915 selftest hangcheck is causing the i915 driver timeouts, as
> > reported by Intel CI:
> > 
> > 	http://gfx-ci.fi.intel.com/cibuglog-ng/issuefilterassoc/24297?query_key=42a999f48fa6ecce068bc8126c069be7c31153b4
> > 
> > When such test runs, the only output is:
> > 
> > 	[   68.811639] i915: Performing live selftests with st_random_seed=0xe138eac7 st_timeout=500
> > 	[   68.811792] i915: Running hangcheck
> > 	[   68.811859] i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck
> > 	[   68.816910] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
> > 	[   68.841597] i915: Running intel_hangcheck_live_selftests/igt_reset_nop
> > 	[   69.346347] igt_reset_nop: 80 resets
> > 	[   69.362695] i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine
> > 	[   69.863559] igt_reset_nop_engine(rcs0): 709 resets
> > 	[   70.364924] igt_reset_nop_engine(bcs0): 903 resets
> > 	[   70.866005] igt_reset_nop_engine(vcs0): 659 resets
> > 	[   71.367934] igt_reset_nop_engine(vcs1): 549 resets
> > 	[   71.869259] igt_reset_nop_engine(vecs0): 553 resets
> > 	[   71.882592] i915: Running intel_hangcheck_live_selftests/igt_reset_idle_engine
> > 	[   72.383554] rcs0: Completed 16605 idle resets
> > 	[   72.884599] bcs0: Completed 18641 idle resets
> > 	[   73.385592] vcs0: Completed 17517 idle resets
> > 	[   73.886658] vcs1: Completed 15474 idle resets
> > 	[   74.387600] vecs0: Completed 17983 idle resets
> > 	[   74.387667] i915: Running intel_hangcheck_live_selftests/igt_reset_active_engine
> > 	[   74.889017] rcs0: Completed 747 active resets
> > 	[   75.174240] intel_engine_reset(bcs0) failed, err:-110
> > 	[   75.174301] bcs0: Completed 525 active resets
> > 
> > After that, the machine just silently hangs.
> > 
> > The root cause is that the flush TLB logic is not working as
> > expected on GEN8.
> > 
> > Tested on an Intel NUC5i7RYB with an i7-5557U Broadwell CPU.
> > 
> > This patch partially reverts the logic by skipping GEN8 from
> > the TLB cache flush.  
> 
> Since I am pretty sure no such failures were spotted when merging the 
> feature I assume the failure is sporadic and/or limited to some 
> configurations? Do you have any details there? Because it is an 
> important security issue we should not revert it lightly.

It occurs every time here:
	https://intel-gfx-ci.01.org/tree/drm-tip/fi-bdw-5557u.html

It also happens on my own NUC5i7RYB every time when the TLB patch is 
applied. Reverting it (or applying this fix) is enough for hangcheck
to pass.

I suspect that TLB flush never happens there, causing ETIMEOUT at
hangcheck.

It could indeed be limited to some specific setups. I dunno.
The only Gen8 machine I have access is my own NUC. So, I can't
test it elsewhere.

Regards,
Mauro


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ