lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111123153154.GF3864@phenom.ffwll.local>
Date:	Wed, 23 Nov 2011 16:31:54 +0100
From:	Daniel Vetter <daniel@...ll.ch>
To:	David Woodhouse <dwmw2@...radead.org>
Cc:	Daniel Vetter <daniel@...ll.ch>, rajesh.sankaran@...el.com,
	Keith Packard <keithp@...thp.com>,
	Matthew Garrett <mjg@...hat.com>,
	intel-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
	dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH] drm/i915: By default, enable RC6 on IVB and SNB when
 reasonable

On Wed, Nov 23, 2011 at 03:03:43PM +0000, David Woodhouse wrote:
> On Wed, 2011-11-23 at 15:39 +0100, Daniel Vetter wrote:
> > At least for the dmar+gfx+semaphores hang I can reproduce, just disabling
> > dmar with intel_iommu=igfx_off is not good enough and iirc the same holds
> > for the dmar+rc6 hangs reported. 
> 
> Um... let me restate that for clarity (and partly for Rajesh's benefit).
> 
> The DMAR associated with the integrated graphics is *disabled*.
> Turned off. Not active. Ever.
> 
> You have a problem when you enable the *other* DMAR units in the system,
> which should not be affecting the graphics device in any way.
> 
> When you do this, you see 'hangs' with semaphores and RC6. Is there a
> better description of these 'hangs' somewhere? Is the hardware
> completely locked?
> 
> These hangs go away when you disable the DMAR units. Again, that is the
> *other* DMAR units in the system that have nothing to do with graphics.
> 
> While I'm getting quite used to DMAR-related errata, this one does make
> me stop and think 'wtf?'. It just seems so incongruous that disabling an
> *unrelated* IOMMU would make the problem go away, and it makes me wonder
> if it's actually a timing-related issue which is always there, but
> something about the use of DMAR for network/disk/etc. makes it more
> likely to trigger?
> 
> We definitely need the hardware folks to get to the bottom of this one.

Ok, let me document the recipe I use to hang my box here. It's about the
dmar+semaphores hang I can reproduce, so might be slightly different in
the actual cause than the dmar+rc6 bug (for that one we only have bug
reports talking about hard freezing requiring power cycling).

- Grab a GT2+ mobile snb (both my and the only other reporters machine
  fits this, so maybe it matters). pci rev 09 (i.e. first production
  silicon).
- Install fc15 with the kde4 spin. I can't reproduce it with any other
  userspace than kde4.
- Grab latest d-i-f from Keith and latest userspace graphics code (to
  avoid hitting any other snb hangs we've tracked down meanwhile).
- Compile kernel with dmar and enable VT-d in the bios.
- Login into the systems with gdm, the machine usually dies within a few
  seconds (while kde4 loads). If that's not good enough, a few minutes of
  light desktop usage will kill it.
- Wait 2 minutes for the stuck-in-atomic detection logic to kick in and
  grab the backtrace over netconsole. Notice that the kernel is stuck
  trying to flush the dmar tlb cache (that's how I managed to track it
  down to a dmar interaction). Backtrace almost identical to the dmar
  issue on ilk. I've lost the backtrace, if you want I can regrab it.

Things I've tried that don't work around the issue:
- Disable dmar for the igfx with intel_iommu=igfx_off
- Apply the ilk workaround (i.e. synchronous dmar tlb flushes + gpu idling
  while flushing).

Things that work:
- Disabling semaphores.
- Disabling dmar in either the bios or on the cmdline with intel_iommu=off

All reporters that tried confirmed that igfx_off is not good enough, only
fully disabling dmar (for both the semaphores and the rc6 related hangs).

Things that look interesting:
- ppgtt support (i.e. using per-proces pagetables on the gfx instead of
  the global gtt) seems to paper over the issue for the original reporter
  of the semaphore related hangs.  Unfortunately not for me, gpu still
  hangs (but doesn't take down the entire system with it). I've not yet
  investigated this one closely. Fyi, the windows driver uses ppgtt
  unconditionally on snb. Also, ppgtt seems to have no effect for at least
  one report of dmar related rc6 hangs.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@...ll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ