linux-kernel - Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20171009091235.ay4yoy6ffbczduqp@phenom.ffwll.local>
Date:   Mon, 9 Oct 2017 11:12:35 +0200
From:   Daniel Vetter <daniel@...ll.ch>
To:     Chris Wilson <chris@...is-wilson.co.uk>
Cc:     Daniel Vetter <daniel@...ll.ch>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mika Kuoppala <mika.kuoppala@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Marta Lofstedt <marta.lofstedt@...el.com>,
        Daniel Vetter <daniel.vetter@...el.com>
Subject: Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in
 set_wedged

On Fri, Oct 06, 2017 at 06:29:08PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 15:20:09)
> > On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> > > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > > stop_machine is not really a locking primitive we should use, except
> > > > when the hw folks tell us the hw is broken and that's the only way to
> > > > work around it.
> > > > 
> > > > This patch tries to address the locking abuse of stop_machine() from
> > > > 
> > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > > Author: Chris Wilson <chris@...is-wilson.co.uk>
> > > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > > 
> > > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > > 
> > > > Chris said parts of the reasons for going with stop_machine() was that
> > > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > > > 
> > > > To stay as close as possible to the stop_machine semantics we first
> > > > update all the submit function pointers to the nop handler, then call
> > > > synchronize_rcu() to make sure no new requests can be submitted. This
> > > > should give us exactly the huge barrier we want.
> > > > 
> > > > I pondered whether we should annotate engine->submit_request as __rcu
> > > > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > > > behind those is to make sure the compiler/cpu barriers are there for
> > > > when you have an actual data structure you point at, to make sure all
> > > > the writes are seen correctly on the read side. But we just have a
> > > > function pointer, and .text isn't changed, so no need for these
> > > > barriers and hence no need for annotations.
> > > > 
> > > > This should fix the followwing lockdep splat:
> > > > 
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > > > ------------------------------------------------------
> > > > kworker/3:4/562 is trying to acquire lock:
> > > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > > > 
> > > > but task is already holding lock:
> > > >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > which lock already depends on the new lock.
> > > > 
> > > > the existing dependency chain (in reverse order) is:
> > > > 
> > > > -> #6 (&dev->struct_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_interruptible_nested+0x1b/0x20
> > > >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> > > >        i915_gem_fault+0x209/0x650 [i915]
> > > >        __do_fault+0x1e/0x80
> > > >        __handle_mm_fault+0xa08/0xed0
> > > >        handle_mm_fault+0x156/0x300
> > > >        __do_page_fault+0x2c5/0x570
> > > >        do_page_fault+0x28/0x250
> > > >        page_fault+0x22/0x30
> > > > 
> > > > -> #5 (&mm->mmap_sem){++++}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __might_fault+0x68/0x90
> > > >        _copy_to_user+0x23/0x70
> > > >        filldir+0xa5/0x120
> > > >        dcache_readdir+0xf9/0x170
> > > >        iterate_dir+0x69/0x1a0
> > > >        SyS_getdents+0xa5/0x140
> > > >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > > 
> > > > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> > > >        down_write+0x3b/0x70
> > > >        handle_create+0xcb/0x1e0
> > > >        devtmpfsd+0x139/0x180
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #3 ((complete)&req.done){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        wait_for_common+0x58/0x210
> > > >        wait_for_completion+0x1d/0x20
> > > >        devtmpfs_create_node+0x13d/0x160
> > > >        device_add+0x5eb/0x620
> > > >        device_create_groups_vargs+0xe0/0xf0
> > > >        device_create+0x3a/0x40
> > > >        msr_device_create+0x2b/0x40
> > > >        cpuhp_invoke_callback+0xc9/0xbf0
> > > >        cpuhp_thread_fun+0x17b/0x240
> > > >        smpboot_thread_fn+0x18a/0x280
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #2 (cpuhp_state-up){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpuhp_issue_call+0x133/0x1c0
> > > >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_writeback_init+0x43/0x67
> > > >        pagecache_init+0x3d/0x42
> > > >        start_kernel+0x3a8/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #1 (cpuhp_state_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_nested+0x1b/0x20
> > > >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_alloc_init+0x28/0x30
> > > >        start_kernel+0x145/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> > > >        check_prev_add+0x430/0x840
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpus_read_lock+0x3d/0xb0
> > > >        stop_machine+0x1c/0x40
> > > >        i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >        i915_reset+0xb9/0x230 [i915]
> > > >        i915_reset_device+0x1f6/0x260 [i915]
> > > >        i915_handle_error+0x2d8/0x430 [i915]
> > > >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >        process_one_work+0x233/0x660
> > > >        worker_thread+0x4e/0x3b0
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > other info that might help us debug this:
> > > > 
> > > > Chain exists of:
> > > >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > > > 
> > > >  Possible unsafe locking scenario:
> > > > 
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&dev->struct_mutex);
> > > >                                lock(&mm->mmap_sem);
> > > >                                lock(&dev->struct_mutex);
> > > >   lock(cpu_hotplug_lock.rw_sem);
> > > > 
> > > >  *** DEADLOCK ***
> > > > 
> > > > 3 locks held by kworker/3:4/562:
> > > >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > stack backtrace:
> > > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > > > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > > > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > > > Call Trace:
> > > >  dump_stack+0x68/0x9f
> > > >  print_circular_bug+0x235/0x3c0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  check_prev_add+0x430/0x840
> > > >  ? irq_work_queue+0x86/0xe0
> > > >  ? wake_up_klogd+0x53/0x70
> > > >  __lock_acquire+0x1420/0x15e0
> > > >  ? __lock_acquire+0x1420/0x15e0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  lock_acquire+0xb0/0x200
> > > >  ? stop_machine+0x1c/0x40
> > > >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> > > >  cpus_read_lock+0x3d/0xb0
> > > >  ? stop_machine+0x1c/0x40
> > > >  stop_machine+0x1c/0x40
> > > >  i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >  i915_reset+0xb9/0x230 [i915]
> > > >  i915_reset_device+0x1f6/0x260 [i915]
> > > >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> > > >  ? work_on_cpu_safe+0x60/0x60
> > > >  i915_handle_error+0x2d8/0x430 [i915]
> > > >  ? vsnprintf+0xd1/0x4b0
> > > >  ? scnprintf+0x3a/0x70
> > > >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> > > >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >  process_one_work+0x233/0x660
> > > >  worker_thread+0x4e/0x3b0
> > > >  kthread+0x152/0x190
> > > >  ? process_one_work+0x660/0x660
> > > >  ? kthread_create_on_node+0x40/0x40
> > > >  ret_from_fork+0x27/0x40
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > 
> > > > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > > > improve commit message.
> > > > 
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > > > Cc: Chris Wilson <chris@...is-wilson.co.uk>
> > > > Cc: Mika Kuoppala <mika.kuoppala@...el.com>
> > > > Cc: Thomas Gleixner <tglx@...utronix.de>
> > > > Cc: Marta Lofstedt <marta.lofstedt@...el.com>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@...el.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> > > >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> > > >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> > > >  3 files changed, 16 insertions(+), 19 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index ab8c6946fea4..e79a6ca60265 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> > > >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> > > >  }
> > > >  
> > > > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > > > +static void engine_complete_requests(struct intel_engine_cs *engine)
> > > >  {
> > > > -       /* We need to be sure that no thread is running the old callback as
> > > > -        * we install the nop handler (otherwise we would submit a request
> > > > -        * to hardware that will never complete). In order to prevent this
> > > > -        * race, we wait until the machine is idle before making the swap
> > > > -        * (using stop_machine()).
> > > > -        */
> > > > -       engine->submit_request = nop_submit_request;
> > > > -
> > > >         /* Mark all executing requests as skipped */
> > > >         engine->cancel_requests(engine);
> > > 
> > > How are we planning to serialise the intel_engine_init_global_seqno()
> > > here with the in-flight nop_submit? With sufficient thrust we will get a
> > > stale breadcrumb and an incomplete request.
> > 
> > Yeah that part looks indeed fishy. Well the entire "let the nop handler
> > fake-complete requests" logic is something I don't really understand. I
> > guess there's an exclusive relationship between requests handled directly
> > (and cancelled in engine->cancel_request) and requests with external
> > dma_fence dependencies.
> > 
> > But then I'm not really seeing what I'm changing, since even with the stop
> > machine you might end up with a bunch of requests depending upon external
> > fences, which then all complete at roughly the same time and race multiple
> > calls to intel_engine_init_global_seqno with each another.
> 
> The stop_machine serialised the update here with the nop_handlers,
> that's the bit that changes.
>  
> > With the fake submission, do we really need to call intel_engine_init_global_seqno?
> 
> Yes. Completion is still determined by i915_seqno_passed() comparing the
> rq against the engine.
> 
> You need this
> 
> @@ -3246,6 +3246,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>  
>  static void engine_set_wedged(struct intel_engine_cs *engine)
>  {
> +       unsigned long flags;
> +
>         /* We need to be sure that no thread is running the old callback as
>          * we install the nop handler (otherwise we would submit a request
>          * to hardware that will never complete). In order to prevent this
> @@ -3261,8 +3263,10 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>          * (lockless) lookup doesn't try and wait upon the request as we
>          * reset it.
>          */
> +       spin_lock_irqsave(&request->engine->timeline->lock, flags);
>         intel_engine_init_global_seqno(engine,
>                                        intel_engine_last_submit(engine));
> +       spin_unlock_irqrestore(&request->engine->timeline->lock, flags);
>  }
>  
> So that the seqno written is ordered with the same spinlock used inside
> the nop submission.

Makes sense, I entirely missed the spinlock on Fri evening. Call me blind
:-)

All amend the patch.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch