linux-kernel - Re: [PATCH 2/5] i915: flush gem obj freeing workqueues to add accuracy to the i915 shrinker

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170410093933.GB6834@nuc-i3427.alporthouse.com>
Date:   Mon, 10 Apr 2017 10:39:33 +0100
From:   Chris Wilson <chris@...is-wilson.co.uk>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     Martin Kepplinger <martink@...teo.de>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        daniel.vetter@...el.com, Dave Airlie <airlied@...il.com>,
        intel-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH 2/5] i915: flush gem obj freeing workqueues to add
 accuracy to the i915 shrinker

On Fri, Apr 07, 2017 at 06:48:58PM +0200, Andrea Arcangeli wrote:
> On Fri, Apr 07, 2017 at 04:30:11PM +0100, Chris Wilson wrote:
> > Not getting hangs is a good sign, but lockdep doesn't like it:
> > 
> > [  460.684901] WARNING: CPU: 1 PID: 172 at kernel/workqueue.c:2418 check_flush_dependency+0x92/0x130
> > [  460.684924] workqueue: PF_MEMALLOC task 172(kworker/1:1H) is flushing !WQ_MEM_RECLAIM events:__i915_gem_free_work [i915]
> > 
> > If I allocated the workqueue with WQ_MEM_RELCAIM, it complains bitterly
> > as well.
> 
> So in PF_MEMALLOC context we can't flush a workqueue with
> !WQ_MEM_RECLAIM.
> 
> 	system_wq = alloc_workqueue("events", 0, 0);
> 
> My point is synchronize_rcu_expedited will still push its work in
> the same system_wq workqueue...
> 
> 		/* Marshall arguments & schedule the expedited grace period. */
> 		rew.rew_func = func;
> 		rew.rew_rsp = rsp;
> 		rew.rew_s = s;
> 		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
> 		schedule_work(&rew.rew_work);
> 
> It's also using schedule_work, so either the above is a false
> positive, or we've still a problem with synchronize_rcu_expedited,
> just a reproducible issue anymore after we stop running it under the
> struct_mutex.

We still do have a problem with using synchronize_rcu_expedited() from
the shrinker as we maybe under someone else's mutex is that involved in
its own RCU dance.

> Even synchronize_sched will wait on the system_wq if
> synchronize_rcu_expedited has been issued in parallel by some other
> code, it's just there's no check for it because it's not invoking
> flush_work to wait.

Right.
 
> The deadlock happens if we flush_work() while holding any lock that
> may be taken by any of the workqueues that could be queued there. i915
> makes sure to flush_work only if the struct_mutex was released (not
> my initial version) but we're not checking if any of the other
> system_wq workqueues could possibly taking a lock that may have been
> hold during the allocation that invoked reclaim, I suppose that is the
> problem left, but I don't see how flush_work is special about it,
> synchronize_rcu_expedited would still have the same issue no? (despite
> no lockdep warning)
> 
> I suspect this means synchronize_rcu_expedited() is not usable in
> reclaim context and lockdep should warn if PF_MEMALLOC is set when
> synchronize_rcu_expedited/synchronize_sched/synchronize_rcu are
> called.

Yes.

> Probably to fix this we should create a private workqueue for both RCU
> and i915 and stop sharing the system_wq with the rest of the system
> (and of course set WQ_MEM_RECLAIM in both workqueues). This makes sure
> when we call synchronize_rcu_expedited; flush_work from the shrinker,
> we won't risk waiting on other random work that may be taking locks
> that are hold by the code that invoked reclaim during an allocation.

We simply do not need to do our own synchronize_rcu* -- it's only used
to flush our slab frees on the off chance that (a) we have any and (b)
we do manage to free a whole slab. It is not the bulk of the memory that
we return to the system from the shrinker.

In the other thread, I stated that we should simply remove it. The
kswapd reclaim path should try to reclaim RCU slabs (by doing a
synhronize_sched or equivalent).

> The macro bug of waiting on system_wq 100% of the time while always
> holding the struct_mutex is gone, but we need to perfect this further
> and stop using the system_wq for RCU and i915 shrinker work.

Agreed. My preference is to simply not do it and leave the dangling RCU
to the core reclaim paths.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre