linux-kernel - Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5231EF5A.7010901@vmware.com>
Date:	Thu, 12 Sep 2013 18:44:10 +0200
From:	Thomas Hellstrom <thellstrom@...are.com>
To:	Maarten Lankhorst <maarten.lankhorst@...onical.com>
CC:	Daniel Vetter <daniel.vetter@...ll.ch>,
	Peter Zijlstra <peterz@...radead.org>,
	Dave Airlie <airlied@...ux.ie>,
	intel-gfx <intel-gfx@...ts.freedesktop.org>,
	dri-devel <dri-devel@...ts.freedesktop.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
> Op 12-09-13 17:36, Daniel Vetter schreef:
>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>>> So I'm poking around the preemption code and stumbled upon:
>>>
>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>
>>> All these sites basically do:
>>>
>>>    while (!trylock())
>>>          yield();
>>>
>>> which is a horrible and broken locking pattern.
>>>
>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>> task that preempted the lock holder at FIFOn.
>>>
>>> Secondly the implementation is worse than usual by abusing
>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>> doesn't retry, but you're using it as a get out of fault path. And
>>> you're using set_need_resched() which is not something a driver should
>>> _ever_ touch.
>>>
>>> Now I'm going to take away set_need_resched() -- and while you can
>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>> because it will be broken due to changes to the preempt code.
>>>
>>> So please as to fix ASAP and don't allow anybody to trick you into
>>> merging silly things like that again ;-)
>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>> removed. It was there to give the error handler a chance to sneak in
>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>> to the days when the locking around our error handler was somewhere
>> between nonexistent and totally broken, nowadays we keep things from
>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>> whip up a patch to rip this out. I'll also check that our testsuite
>> properly exercises this path (needs a bit of work on a quick look for
>> better coverage).
>>
>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>> into it's own pagefault handler and then deadlock, the trylock just
>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>> fun userspace did and now have testcases for them. The right solution
>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>> holds locks and have slowpaths which drops locks, copies stuff into a
>> temp allocation and then continues. At least that's how we've fixed
>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>
> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..

I think a possible fix would be if fault() were allowed to return an 
error and drop the mmap_sem() before returning.

Otherwise we need to track down all copy_to_user / copy_from_user which 
happen with bo::reserve held.

/Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/