[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1208200930510.26904@cobra.newdream.net>
Date: Mon, 20 Aug 2012 09:54:59 -0700 (PDT)
From: Sage Weil <sage@...tank.com>
To: Mel Gorman <mgorman@...e.de>
cc: davem@...emloft.net, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, ceph-devel@...r.kernel.org,
neilb@...e.de, a.p.zijlstra@...llo.nl, michaelc@...wisc.edu,
emunson@...bm.net, eric.dumazet@...il.com, sebastian@...akpoint.cc,
cl@...ux.com, akpm@...ux-foundation.org,
torvalds@...ux-foundation.org
Subject: Re: regression with poll(2)
On Mon, 20 Aug 2012, Mel Gorman wrote:
> On Sun, Aug 19, 2012 at 11:49:31AM -0700, Sage Weil wrote:
> > I've bisected and identified this commit:
> >
> > netvm: propagate page->pfmemalloc to skb
> >
> > The skb->pfmemalloc flag gets set to true iff during the slab allocation
> > of data in __alloc_skb that the the PFMEMALLOC reserves were used. If the
> > packet is fragmented, it is possible that pages will be allocated from the
> > PFMEMALLOC reserve without propagating this information to the skb. This
> > patch propagates page->pfmemalloc from pages allocated for fragments to
> > the skb.
> >
> > Signed-off-by: Mel Gorman <mgorman@...e.de>
> > Acked-by: David S. Miller <davem@...emloft.net>
> > Cc: Neil Brown <neilb@...e.de>
> > Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> > Cc: Mike Christie <michaelc@...wisc.edu>
> > Cc: Eric B Munson <emunson@...bm.net>
> > Cc: Eric Dumazet <eric.dumazet@...il.com>
> > Cc: Sebastian Andrzej Siewior <sebastian@...akpoint.cc>
> > Cc: Mel Gorman <mgorman@...e.de>
> > Cc: Christoph Lameter <cl@...ux.com>
> > Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> >
>
> Ok, thanks.
>
> > I've retested several times and confirmed that this change leads to the
> > breakage, and also confirmed that reverting it on top of -rc1 also fixes
> > the problem.
> >
> > I've also added some additional instrumentation to my code and confirmed
> > that the process is blocking on poll(2) while netstat is reporting
> > data available on the socket.
> >
> > What can I do to help track this down?
> >
>
> Can the following patch be tested please? It is reported to fix an fio
> regression that may be similar to what you are experiencing but has not
> been picked up yet.
This patch appears to resolve things for me as well, at least after a
couple of passes. I'll let you know if I see any further problems come up
with more testing.
Thanks!
sage
>
> ---8<---
> From: Alex Shi <alex.shi@...el.com>
> Subject: [PATCH] mm: correct page->pfmemalloc to fix deactivate_slab regression
>
> commit cfd19c5a9ec (mm: only set page->pfmemalloc when
> ALLOC_NO_WATERMARKS was used) try to narrow down page->pfmemalloc
> setting, but it missed some places the pfmemalloc should be set.
>
> So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
> cause incorrect deactivate_slab() on our core2 server:
>
> 64.73% fio [kernel.kallsyms] [k] _raw_spin_lock
> |
> --- _raw_spin_lock
> |
> |---0.34%-- deactivate_slab
> | __slab_alloc
> | kmem_cache_alloc
> | |
>
> That causes our fio sync write performance has 40% regression.
>
> This patch move the checking in get_page_from_freelist, that resolved
> this issue.
>
> Signed-off-by: Alex Shi <alex.shi@...el.com>
> ---
> mm/page_alloc.c | 21 +++++++++++----------
> 1 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 009ac28..07f1924 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1928,6 +1928,17 @@ this_zone_full:
> zlc_active = 0;
> goto zonelist_scan;
> }
> +
> + if (page)
> + /*
> + * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> + * necessary to allocate the page. The expectation is
> + * that the caller is taking steps that will free more
> + * memory. The caller should avoid the page being used
> + * for !PFMEMALLOC purposes.
> + */
> + page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> +
> return page;
> }
>
> @@ -2389,14 +2400,6 @@ rebalance:
> zonelist, high_zoneidx, nodemask,
> preferred_zone, migratetype);
> if (page) {
> - /*
> - * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> - * necessary to allocate the page. The expectation is
> - * that the caller is taking steps that will free more
> - * memory. The caller should avoid the page being used
> - * for !PFMEMALLOC purposes.
> - */
> - page->pfmemalloc = true;
> goto got_pg;
> }
> }
> @@ -2569,8 +2572,6 @@ retry_cpuset:
> page = __alloc_pages_slowpath(gfp_mask, order,
> zonelist, high_zoneidx, nodemask,
> preferred_zone, migratetype);
> - else
> - page->pfmemalloc = false;
>
> trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>
> --
> 1.7.5.4
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists