linux-kernel - Re: [PATCH] drm/i915: Fix lock order reversal in GTT pwrite path.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <1236706838.10190.72.camel@gaiman>
Date:	Tue, 10 Mar 2009 10:40:38 -0700
From:	Eric Anholt <eric@...olt.net>
To:	linux-kernel@...r.kernel.org
Cc:	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] drm/i915: Fix lock order reversal in GTT pwrite path.

On Mon, 2009-03-09 at 18:33 -0700, Eric Anholt wrote:
> Since the pagefault path determines that the lock order we use has to be
> mmap_sem -> struct_mutex, we can't allow page faults to occur while the
> struct_mutex is held.  To fix this in pwrite, we first try optimistically to
> see if we can copy from user without faulting.  If it fails, fall back to
> allocating memory, copying, then taking the lock and copying it to the GPU.
> 
> Thanks to Linus for suggesting this (in retrospect) obvious way of doing it.

Simple, obvious, and wrong, I guess:

[  114.016013] BUG: scheduling while atomic: oglconform/4555/0x10000001
[  114.016015] 1 lock held by oglconform/4555:
[  114.016017]  #0:  (&dev->struct_mutex){--..}, at: [<f8a99cad>] i915_gem_pwrite_
ioctl+0x3be/0x5f5 [i915]
[  114.016033] Modules linked in: i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect 8250_pnp
[  114.016042] Pid: 4555, comm: oglconform Not tainted 2.6.29-rc6drm-intel-next #175
[  114.016044] Call Trace:
[  114.016048]  [<c0128e02>] __schedule_bug+0x5e/0x65
[  114.016051]  [<c07ccbe8>] schedule+0x9f/0x9e4
[  114.016055]  [<c012904f>] __cond_resched+0x25/0x3b
[  114.016058]  [<c07cd5bf>] _cond_resched+0x24/0x2f
[  114.016062]  [<c07cdfa9>] mutex_lock_nested+0x22/0x254
[  114.016065]  [<c014a631>] ? mark_held_locks+0x53/0x6a
[  114.016068]  [<c07cf66d>] ? _spin_unlock_irq+0x22/0x26
[  114.016072]  [<c014a7c8>] ? trace_hardirqs_on_caller+0xf3/0x12d
[  114.016075]  [<c0167ced>] generic_file_aio_write+0x54/0xbd
[  114.016079]  [<c0184cc5>] do_sync_write+0xab/0xe9
[  114.016082]  [<c0130ce9>] ? __do_softirq+0x135/0x13d
[  114.016086]  [<c013d1b3>] ? autoremove_wake_function+0x0/0x33
[  114.016089]  [<c0102f1c>] ? restore_nocheck_notrace+0x0/0xe
[  114.016093]  [<c014007b>] ? hrtimer_interrupt+0x53/0x146
[  114.016097]  [<c02a5662>] ? security_file_permission+0xf/0x11
[  114.016100]  [<c0184c1a>] ? do_sync_write+0x0/0xe9
[  114.016103]  [<c018549b>] vfs_write+0x8a/0x104
[  114.016116]  [<f8a99cfb>] i915_gem_pwrite_ioctl+0x40c/0x5f5 [i915]

Peter, it seems that pagefault_disable() implies not just disabling
pagefaults, but also scheduling.  I'm just trying to avoid taking
mmap_sem through the use of pagefault_disable(), while I'd like to still
be able to schedule if that's what it takes to get the work done.
Should I be able to do this, or should I just rewrite things to not use
vfs_write() and do my sleeping outside the lock?  I'm not opposed to
that, I just want to know if you needed an excuse to fix this.

-- 
Eric Anholt
eric@...olt.net                         eric.anholt@...el.com



Download attachment "signature.asc" of type "application/pgp-signature" (198 bytes)