[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160929162808.745c869b@xhacker>
Date: Thu, 29 Sep 2016 16:28:08 +0800
From: Jisheng Zhang <jszhang@...vell.com>
To: Chris Wilson <chris@...is-wilson.co.uk>
CC: <akpm@...ux-foundation.org>, <mgorman@...hsingularity.net>,
<rientjes@...gle.com>, <iamjoonsoo.kim@....com>,
<agnel.joel@...il.com>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH] mm/vmalloc: reduce the number of lazy_max_pages to
reduce latency
On Thu, 29 Sep 2016 09:18:18 +0100 Chris Wilson wrote:
> On Thu, Sep 29, 2016 at 03:34:11PM +0800, Jisheng Zhang wrote:
> > On Marvell berlin arm64 platforms, I see the preemptoff tracer report
> > a max 26543 us latency at __purge_vmap_area_lazy, this latency is an
> > awfully bad for STB. And the ftrace log also shows __free_vmap_area
> > contributes most latency now. I noticed that Joel mentioned the same
> > issue[1] on x86 platform and gave two solutions, but it seems no patch
> > is sent out for this purpose.
> >
> > This patch adopts Joel's first solution, but I use 16MB per core
> > rather than 8MB per core for the number of lazy_max_pages. After this
> > patch, the preemptoff tracer reports a max 6455us latency, reduced to
> > 1/4 of original result.
>
> My understanding is that
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 91f44e78c516..3f7c6d6969ac 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -626,7 +626,6 @@ void set_iounmap_nonlazy(void)
> static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
> int sync, int force_flush)
> {
> - static DEFINE_SPINLOCK(purge_lock);
> struct llist_node *valist;
> struct vmap_area *va;
> struct vmap_area *n_va;
> @@ -637,12 +636,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
> * should not expect such behaviour. This just simplifies locking for
> * the case that isn't actually used at the moment anyway.
> */
> - if (!sync && !force_flush) {
> - if (!spin_trylock(&purge_lock))
> - return;
> - } else
> - spin_lock(&purge_lock);
> -
> if (sync)
> purge_fragmented_blocks_allcpus();
>
> @@ -667,7 +660,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
> __free_vmap_area(va);
> spin_unlock(&vmap_area_lock);
Hi Chris,
Per my test, the bottleneck now is __free_vmap_area() over the valist, the
iteration is protected with spinlock vmap_area_lock. So the larger lazy max
pages, the longer valist, the bigger the latency.
So besides above patch, we still need to remove vmap_are_lock or replace with
mutex.
Thanks,
Jisheng
> }
> - spin_unlock(&purge_lock);
> }
>
> /*
>
>
> should now be safe. That should significantly reduce the preempt-disabled
> section, I think.
> -Chris
>
Powered by blists - more mailing lists