lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1389628849-1614-33-git-send-email-luis.henriques@canonical.com>
Date:	Mon, 13 Jan 2014 15:57:53 +0000
From:	Luis Henriques <luis.henriques@...onical.com>
To:	linux-kernel@...r.kernel.org, stable@...r.kernel.org,
	kernel-team@...ts.ubuntu.com
Cc:	Ian Lister <ian.lister@...el.com>,
	Ben Widawsky <benjamin.widawsky@...el.com>,
	Stéphane Marchesin <marcheu@...omium.org>,
	"Bloomfield, Jon" <jon.bloomfield@...el.com>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	Luis Henriques <luis.henriques@...onical.com>
Subject: [PATCH 3.11 032/208] drm/i915: Fix use-after-free in do_switch

3.11.10.3 -stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Vetter <daniel.vetter@...ll.ch>

commit acc240d41ea1ab9c488a79219fb313b5b46265ae upstream.

So apparently under ridiculous amounts of memory pressure we can get
into trouble in do_switch when we try to move the old hw context
backing storage object onto the active lists.

With list debugging enabled that usually results in us chasing a
poisoned pointer - which means we've hit upon a vma that has been
removed from all lrus with list_del (and then deallocated, so it's a
real use-after free).

Ian Lister has done some great callchain chasing and noticed that we
can reenter do_switch:

i915_gem_do_execbuffer()

i915_switch_context()

do_switch()
   from = ring->last_context;
   i915_gem_object_pin()

      i915_gem_object_bind_to_gtt()
         ret = drm_mm_insert_node_in_range_generic();
         // If the above call fails then it will try i915_gem_evict_something()
         // If that fails it will call i915_gem_evict_everything() ...
	 i915_gem_evict_everything()
	    i915_gpu_idle()
	       i915_switch_context(DEFAULT_CONTEXT)

Like with everything else where the shrinker or eviction code can
invalidate pointers we need to reload relevant state.

Note that there's no need to recheck whether a context switch is still
required because:

- Doing a switch to the same context is harmless (besides wasting a
  bit of energy).

- This can only happen with the default context. But since that one's
  pinned we'll never call down into evict_everything under normal
  circumstances. Note that there's a little driver bringup fun
  involved namely that we could recourse into do_switch for the
  initial switch. Atm we're fine since we assign the context pointer
  only after the call to do_switch at driver load or resume time. And
  in the gpu reset case we skip the entire setup sequence (which might
  be a bug on its own, but definitely not this one here).

Cc'ing stable since apparently ChromeOS guys are seeing this in the
wild (and not just on artificial stress tests), see the reference.

Note that in upstream code doesn't calle evict_everything directly
from evict_something, that's an extension in this product branch. But
we can still hit upon this bug (and apparently we do, see the linked
backtraces). I've noticed this while trying to construct a testcase
for this bug and utterly failed to provoke it. It looks like we need
to driver the system squarly into the lowmem wall and provoke the
shrinker to evict the context object by doing the last-ditch
evict_everything call.

Aside: There's currently no means to get a badly-fragmenting hw
context object away from a bad spot in the upstream code. We should
fix this by at least adding some code to evict_something to handle hw
contexts.

References: https://code.google.com/p/chromium/issues/detail?id=248191
Reported-by: Ian Lister <ian.lister@...el.com>
Cc: Ian Lister <ian.lister@...el.com>
Cc: Ben Widawsky <benjamin.widawsky@...el.com>
Cc: Stéphane Marchesin <marcheu@...omium.org>
Cc: Bloomfield, Jon <jon.bloomfield@...el.com>
Tested-by: Rafael Barbalho <rafael.barbalho@...el.com>
Reviewed-by: Ian Lister <ian.lister@...el.com>
Signed-off-by: Daniel Vetter <daniel.vetter@...ll.ch>
Signed-off-by: Luis Henriques <luis.henriques@...onical.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 90b0491..5b3087f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -409,11 +409,21 @@ static int do_switch(struct i915_hw_context *to)
 	if (ret)
 		return ret;
 
-	/* Clear this page out of any CPU caches for coherent swap-in/out. Note
+	/*
+	 * Pin can switch back to the default context if we end up calling into
+	 * evict_everything - as a last ditch gtt defrag effort that also
+	 * switches to the default context. Hence we need to reload from here.
+	 */
+	from = ring->last_context;
+
+	/*
+	 * Clear this page out of any CPU caches for coherent swap-in/out. Note
 	 * that thanks to write = false in this call and us not setting any gpu
 	 * write domains when putting a context object onto the active list
 	 * (when switching away from it), this won't block.
-	 * XXX: We need a real interface to do this instead of trickery. */
+	 *
+	 * XXX: We need a real interface to do this instead of trickery.
+	 */
 	ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
 	if (ret) {
 		i915_gem_object_unpin(to->obj);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ