[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20161013151553.6cb79106d0d347187d105467@gmail.com>
Date: Thu, 13 Oct 2016 15:15:53 +0200
From: Vitaly Wool <vitalywool@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: Linux-MM <linux-mm@...ck.org>, linux-kernel@...r.kernel.org,
Seth Jennings <sjenning@...hat.com>,
Dan Streetman <ddstreet@...e.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2] z3fold: add shrinker
On Thu, 13 Oct 2016 11:20:06 +1100
Dave Chinner <david@...morbit.com> wrote:
<snip>
>
> That's an incorrect assumption. Long spinlock holds prevent
> scheduling on that CPU, and so we still get latency problems.
Fair enough. The problem is, some of the z3fold code that need mutual
exclusion runs with preemption disabled so spinlock is still the way
to go. I'll try to avoid taking it while actual shrinking is in progress
though.
> > Please also note that the time
> > spent in the loop is deterministic since we take not more than one entry
> > from every unbuddied list.
>
> So the loop is:
>
> for_each_unbuddied_list_down(i, NCHUNKS - 3) {
>
> NCHUNKS = (PAGE_SIZE - (1 << (PAGE_SHIFT - 6)) >> (PAGE_SHIFT - 6)
>
> So for 4k page, NCHUNKS = (4096 - (1<<6)) >> 6, which is 63. So,
> potentially 60 memmoves under a single spinlock on a 4k page
> machine. That's a lot of work, especially as some of those memmoves
> are going to move a large amount of data in the page.
>
> And if we consider 64k pages, we've now got NCHUNKS = 1023, which
> means your shrinker is not, by default, going to scan all your
> unbuddied lists because it will expire nr_to_scan (usually
> SHRINK_BATCH = 128) before it's got through all of them. So not only
> will the shrinker do too much under a spinlock, it won't even do
> what you want it to do correctly on such setups.
Thanks for the pointer, I'll address that in the new patch.
> Further, the way nr_to_scan is decremented and the shrinker return
> value are incorrect. nr_to_scan is not the /number of objects to
> free/, but the number of objects to /check for reclaim/. The
> shrinker is then supposed to return the number it frees (or
> compacts) to give feedback to the shrinker infrastructure about how
> much reclaim work is being done (i.e. scanned vs freed ratio). This
> code always returns 0, which tells the shrinker infrastructure that
> it's not making progress...
Will fix.
>
> > What I could do though is add the following piece of code at the end of
> > the loop, right after the /break/:
> > spin_unlock(&pool->lock);
> > cond_resched();
> > spin_lock(&pool->lock);
> >
> > Would that make sense for you?
>
> Not really, because it ignores the fact that shrinkers can (and
> often do) run concurrently on multiple CPUs, and so serialising them
> all on a spinlock just causes contention, even if you do this.
>
> Memory reclaim is only as good as the worst shrinker it runs. I
> don't care what your subsystem does, but if you're implementing a
> shrinker then it needs to play by memory reclaim and shrinker
> context rules.....
>
Ok, see above.
Best regards,
Vitaly
Powered by blists - more mailing lists