linux-kernel - Re: [Intel-gfx] [PATCH] kthread: finer-grained lockdep/cross-release completion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 7 Dec 2017 21:56:57 +0100
From:   Daniel Vetter <daniel@...ll.ch>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        DRI Development <dri-devel@...ts.freedesktop.org>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        Marta Lofstedt <marta.lofstedt@...el.com>,
        Byungchul Park <byungchul.park@....com>,
        Ingo Molnar <mingo@...nel.org>, Tejun Heo <tj@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>, Shaohua Li <shli@...com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jens Axboe <axboe@...nel.dk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jonathan Corbet <corbet@....net>,
        Oleg Nesterov <oleg@...hat.com>,
        Daniel Vetter <daniel.vetter@...el.com>
Subject: Re: [Intel-gfx] [PATCH] kthread: finer-grained lockdep/cross-release
 completion

On Thu, Dec 07, 2017 at 08:57:09PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 07, 2017 at 03:58:28PM +0100, Daniel Vetter wrote:
> 
> > [   85.069417] gem_exec_captur/2810 is trying to acquire lock:
> > [   85.069419]  ((completion)&self->parked){+.+.}, at: [<ffffffff8109d69d>] kthread_park+0x3d/0x50
> > [   85.069426] 
> >                but task is already holding lock:
> > [   85.069428]  (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]
> > [   85.069448] 
> >                which lock already depends on the new lock.
> > 
> > [   85.069451] 
> >                the existing dependency chain (in reverse order) is:
> > [   85.069454] 
> >                -> #3 (&dev->struct_mutex){+.+.}:
> > [   85.069460]        __mutex_lock+0x81/0x9b0
> > [   85.069481]        i915_mutex_lock_interruptible+0x47/0x130 [i915]
> > [   85.069502]        i915_gem_fault+0x201/0x760 [i915]
> > [   85.069507]        __do_fault+0x15/0x70
> > [   85.069509]        __handle_mm_fault+0x7bf/0xda0
> > [   85.069512]        handle_mm_fault+0x14f/0x2f0
> > [   85.069515]        __do_page_fault+0x2d1/0x560
> > [   85.069518]        page_fault+0x22/0x30
> > [   85.069520] 
> >                -> #2 (&mm->mmap_sem){++++}:
> > [   85.069525]        __might_fault+0x63/0x90
> > [   85.069529]        _copy_to_user+0x1e/0x70
> > [   85.069532]        perf_read+0x21d/0x290
> > [   85.069534]        __vfs_read+0x1e/0x120
> > [   85.069536]        vfs_read+0xa1/0x150
> > [   85.069539]        SyS_read+0x40/0xa0
> > [   85.069541]        entry_SYSCALL_64_fastpath+0x1c/0x89
> 
> >                -> #0 ((completion)&self->parked){+.+.}:
> 
> > [   85.069692] Chain exists of:
> >                  (completion)&self->parked --> &mm->mmap_sem --> &dev->struct_mutex
> 
> > [   85.069718] 3 locks held by gem_exec_captur/2810:
> 
> > [   85.069732]  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]
> 
> >                stack backtrace:
> 
> > [   85.069788]  lock_acquire+0xaf/0x200
> > [   85.069793]  wait_for_common+0x54/0x210
> > [   85.069807]  kthread_park+0x3d/0x50
> > [   85.069827]  i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
> > [   85.069849]  i915_gem_reset_prepare+0x2c/0x60 [i915]
> > [   85.069865]  i915_reset+0x66/0x230 [i915]
> > [   85.069881]  i915_reset_device+0x1cb/0x230 [i915]
> > [   85.069919]  i915_handle_error+0x2d3/0x430 [i915]
> > [   85.069951]  i915_wedged_set+0x79/0xc0 [i915]
> > [   85.069955]  simple_attr_write+0xab/0xc0
> > [   85.069959]  full_proxy_write+0x4b/0x70
> > [   85.069961]  __vfs_write+0x1e/0x130
> > [   85.069976]  vfs_write+0xc0/0x1b0
> > [   85.069979]  SyS_write+0x40/0xa0
> > [   85.069982]  entry_SYSCALL_64_fastpath+0x1c/0x89
> 
> 
> 	Thread-A				k-Thread
> 
> 	i915_reset_device
> #3	  mutex_lock(&dev->struct_mutex)
> 	  i915_reset()
> 	    i915_gem_reset_prepare()
> 	      i915_gem_reset_prepare_engine()
> 	        kthread_park()
> 
> 						__do_page_fault()
> #2						  down_read(&mm->mmap_sem);
> 						  handle_mm_fault()
> 						    __handle_mm_fault()
> 						      __do_fault()
> 						        i915_gem_fault()
> 							  i915_mutex_lock_interruptible()
> #3							    mutex_lock(&dev->struct_mutex)
> 
> 							/* twiddles thumbs forever more */
> 
> #0		  wait_for_common()
> 
> #0						complete()
> 
> 
> Is what it says I suppose. Now I don't know enough about that i915 code
> to say if that breadcrumbs_signal thread can ever trigger a fault or
> not. I got properly lost in that dma_fence callback maze.
> 
> You're saying not?

Our own kthread, no. At least a tons of run on our CI with the kthread
patch applied shut up lockdep splats for good. And since we have all the
i915 kthreads still with the same lockdep_map even with the patch applied,
since they are all created in the same function, I think that's pretty
solid evidence.

[There's also really no reasonable reason for it to fault, but I trust
automated tools more to check this stuff than my own brain. The test suite
we're running is fairly nasty and does all kinds of corner case
thrashing. Note that the dma_fence callbacks can be provideded by any
other driver (think multi-gpu desktops and stuff), but the contract is
that they must be able to handle hardirq context. Faulting's definitely
not on the table.]

The problem lockdep seems to complain about is that some random other
kthread could fault, end up in the i915 fault handler, and get stuck until
i915_reset_device is done doing what it needs to do. But as long as that
kthread is in turn not providing a service that i915_reset_device needs, I
don't see how that can deadlock. And if we have that case (there was
definitely plenty of that stuff that cross-release uncovered in our code,
we had to shuffle a bunch of allocations and things out from under
dev->struct_mutex), then there should be another lock or completion that
closes the loop again.

Or I'm not understanding what your asking, this stuff is a bit complicated
:-)

> (also, that comment near need_resched() doesn't make sense to me)

I assume you mean the one in intel_breadcrumbs_signaler(). The hw design
is somewhat screwed up and depends upon ridiculously low interrupt
servicing time. We get there by essentially implementing something like
netdev polled mode, from irq context. Like net polling if the scheduler
gets pissed at us we stop and dump it all into a kthread. From a latency
and ops/sec pov a gpu is pretty close to networking sometimes.

[Note: I just have a rough idea what the code is supposed to do, I didn't
write/review/debug that one.]

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch