lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170510104354.GB5011@redhat.com>
Date:   Wed, 10 May 2017 12:43:54 +0200
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        "J. R. Okajima" <hooanon05g@...il.com>, chris@...is-wilson.co.uk,
        daniel.vetter@...ll.ch, jani.nikula@...el.com,
        linux-kernel@...r.kernel.org
Subject: Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

Hello,

On Tue, May 09, 2017 at 08:04:24PM -0700, Hugh Dickins wrote:
> On Mon, 8 May 2017, Joonas Lahtinen wrote:
> > On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> > > On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > > > Thanx for the reply.
> > > > > 
> > > > > Andrea Arcangeli:
> > > > > > 
> > > > > > Yes I already reported this, my original fix was way more efficient
> > > > > > (and also safer considering the above) than what landed upstream. My
> > > > > > feedback was ignored though.
> > > > > > 
> > > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > > > 
> > > > > I see.
> > > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> > > > > others all stopped working due to the synchronize_rcu_expedited call
> > > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > > > an i915 user.
> > > > 
> > > > Filing a bug in freedesktop.org with all the details is the fastest way
> > > > of getting help. Without the bug (and with such little information as
> > > > the previous e-mail) it's hard to estimate the extent and nature of the
> > > > bug.
> > > > 
> > > > I've anyway gone and prepared a patch to drop the RCU sync completely
> > > > from shrinker phase, as discussed originally with Chris.
> > > 
> > > Is that a patch that will be suitable for 4.11-stable?  Please do post
> > > it here.  I had not experienced this i915-induced hang at all when
> > > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> > > final I can get it fairly easily (I haven't tried Andrea's fix yet).
> > 
> > Please try:
> > 
> > https://patchwork.freedesktop.org/patch/154713/
> > 
> > If it works, a Tested-by: would be appreciated.
> 
> Yes, that works for me, thank you.
> 
> Tested-by: Hugh Dickins <hughd@...gle.com>
> 
> But the linked patch seems to be lacking a Reported-by (not me) tag,
> a Fixes tag, a Cc stable tag, and any indication in the Subject or
> commit message that this patch is something needed to fix hangs
> observed by several people - it just sounds like a minor cleanup.

It works for me too. I'm running my workstation also with
synchronize_rcu removed from i915_gem_shrink_all in addition to the
above. Isn't the oom method invoked from reclaim context too? As far
as I can tell synchronize_rcu can end up throttling on a background
synchronize_rcu_expedited(), so it might end up in the same issue
unless removed too.

Tested-by: Andrea Arcangeli <aarcange@...hat.com>

(I can't reproduce the lockups 100% of the time, but they never
happened again with this patch and I happened to run the load that
reproduces them a couple of times already with v4.11 and this patch
applied)

It's also certainly improving performance by removing the
synchronize_rcu_expedited from the _count methods where it was useless
(in addition to unsafe).

Thanks,
Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ