linux-kernel - Re: [Bug #13819] system freeze when switching to console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.01.0909081219130.7458@localhost.localdomain>
Date:	Tue, 8 Sep 2009 12:26:45 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jesse Barnes <jbarnes@...tuousgeek.org>
cc:	reinette chatre <reinette.chatre@...el.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Eric Anholt <eric@...olt.net>, "Ma, Ling" <ling.ma@...el.com>,
	"bugzilla-daemon@...zilla.kernel.org" 
	<bugzilla-daemon@...zilla.kernel.org>
Subject: Re: [Bug #13819] system freeze when switching to console

On Tue, 8 Sep 2009, Jesse Barnes wrote:
> 
> Theoretically i915_gem_idle should prevent any user interrupts from
> coming in.

That is _entirely_ immaterial.

The thing is, interrupts can be shared. So it does not matter ONE WHIT 
that you are trying to idle the hardware - there may be _other_ hardware 
in the machine that is not idle, and that raises the same shared 
interrupt. End result: the irq handler will be called, whether your 
particular hardware is idle or not.

So if you tear down data structures that the interrupt handler needs, you 
_ABSOLUTELY_ must first unregister the whole interrupt.

Also, even if there are no shared interrupts or any other devices, there 
can easily be old pending interrupts still queued up on IO-APIC's etc. So 
even though you quiesce the hardware, there is no guarantee that there 
aren't some pending interrupts that happened just before you turned off 
the interrupt from the hardware side, and are still "en route" to the CPU.

Which gets us exactly the same rule as if there were shared interrupts: if 
your interrupt handler depends on some data structure, you must tear down 
the interrupt handler _before_ you tear down the data structures it 
depends on (and in the reverse order when setting things up, of course).

> If we uninstall the IRQ first we i915_gem_idle probably
> won't work anymore, since it queues an interrupt and waits for it.

So then you'd better fix that. Because the code as is is very 
fundamentally buggy.

> Eric, any thoughts on this?  We shouldn't be racing to queue new work
> after the idle call since we suspend GEM at that point, so we must be
> failing to manage our active lists properly somehow?

See my previous email. The bug is that you do

  i915_gem_cleanup_ringbuffer ->
    i915_gem_cleanup_hws ->
      dev_priv->hw_status_page = NULL;

while interrupts are still enabled and coming in. And the interrupt path 
wants to access that hw_status_page. Which you just destroyed.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/