[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <499DC8EC.3000806@shipmail.org>
Date: Thu, 19 Feb 2009 22:02:36 +0100
From: Thomas Hellstrom <thomas@...pmail.org>
To: Peter Zijlstra <peterz@...radead.org>
CC: Eric Anholt <eric@...olt.net>, Wang Chen <wangchen@...fujitsu.com>,
Nick Piggin <nickpiggin@...oo.com.au>,
Ingo Molnar <mingo@...e.hu>, dri-devel@...ts.sourceforge.net,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm: Fix lock order reversal between mmap_sem and struct_mutex.
Peter Zijlstra wrote:
> On Tue, 2009-02-17 at 16:59 -0800, Eric Anholt wrote:
>
>> The basic problem was
>> mmap_sem (do_mmap()) -> struct_mutex (drm_gem_mmap(), i915_gem_fault())
>> struct_mutex (i915_gem_execbuffer()) -> mmap_sem (copy_from/to_user())
>>
>
> That's not the only problem, there's also:
>
> dup_mmap()
> down_write(mmap_sem)
> vm_ops->open() -> drm_vm_open()
> mutex_lock(struct_mutex);
>
>
>> We have plenty of places where we want to hold device state the same
>> (struct_mutex) while we move a non-trivial amount of data
>> (copy_from/to_user()), such as i915_gem_pwrite(). Solve this by moving the
>> easy things that needed struct_mutex with mmap_sem held to using a lock to
>> cover just those data structures (offset hash and offset manager), and do
>> trylock and reschedule in fault.
>>
>
> So we establish,
>
> mmap_sem
> offset_mutex
>
> i915_gem_mmap_gtt_ioctl()
> mutex_lock(struct_mutex)
> i915_gem_create_mmap_offset()
> mutex_lock(offset_mutex)
>
> However we still have
>
> struct_mutex
> mmap_sem
>
> in basically every copy_*_user() case
>
> But you cannot seem to switch ->fault() to use offset_mutex, which would
> work out the inversion because you then have:
>
> struct_mutex
> mmap_sem
> offset_mutex
>
> So why bother with the offset_mutex? Instead you make your ->fault()
> fail randomly.
>
> I'm not sure what Wang Chen sees after this patch, but I should not be
> the exact same splat, still it would not at all surprise me if there's
> plenty left.
>
> The locking looks very fragile and I don't think this patch is helping
> anything, sorry.
>
>
It looks to me like the driver preferred locking order is
object_mutex (which happens to be the device global struct_mutex)
mmap_sem
offset_mutex.
So if one could avoid using the struct_mutex for object bookkeeping (A
separate lock) then
vm_open() and vm_close() would adhere to that locking order as well,
simply by not taking the struct_mutex at all.
So only fault() remains, in which that locking order is reversed.
Personally I think the trylock ->reschedule->retry method with proper
commenting is a good solution. It will be the _only_ place where locking
order is reversed and it is done in a deadlock-safe manner. Note that
fault() doesn't really fail, but requests a retry from user-space with
rescheduling to give the process holding the struct_mutex time to
release it.
/Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists