lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1208200930510.26904@cobra.newdream.net>
Date:	Mon, 20 Aug 2012 09:54:59 -0700 (PDT)
From:	Sage Weil <sage@...tank.com>
To:	Mel Gorman <mgorman@...e.de>
cc:	davem@...emloft.net, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, ceph-devel@...r.kernel.org,
	neilb@...e.de, a.p.zijlstra@...llo.nl, michaelc@...wisc.edu,
	emunson@...bm.net, eric.dumazet@...il.com, sebastian@...akpoint.cc,
	cl@...ux.com, akpm@...ux-foundation.org,
	torvalds@...ux-foundation.org
Subject: Re: regression with poll(2)

On Mon, 20 Aug 2012, Mel Gorman wrote:
> On Sun, Aug 19, 2012 at 11:49:31AM -0700, Sage Weil wrote:
> > I've bisected and identified this commit:
> > 
> >     netvm: propagate page->pfmemalloc to skb
> >     
> >     The skb->pfmemalloc flag gets set to true iff during the slab allocation
> >     of data in __alloc_skb that the the PFMEMALLOC reserves were used.  If the
> >     packet is fragmented, it is possible that pages will be allocated from the
> >     PFMEMALLOC reserve without propagating this information to the skb.  This
> >     patch propagates page->pfmemalloc from pages allocated for fragments to
> >     the skb.
> >     
> >     Signed-off-by: Mel Gorman <mgorman@...e.de>
> >     Acked-by: David S. Miller <davem@...emloft.net>
> >     Cc: Neil Brown <neilb@...e.de>
> >     Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> >     Cc: Mike Christie <michaelc@...wisc.edu>
> >     Cc: Eric B Munson <emunson@...bm.net>
> >     Cc: Eric Dumazet <eric.dumazet@...il.com>
> >     Cc: Sebastian Andrzej Siewior <sebastian@...akpoint.cc>
> >     Cc: Mel Gorman <mgorman@...e.de>
> >     Cc: Christoph Lameter <cl@...ux.com>
> >     Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> >     Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> > 
> 
> Ok, thanks.
> 
> > I've retested several times and confirmed that this change leads to the 
> > breakage, and also confirmed that reverting it on top of -rc1 also fixes 
> > the problem.
> > 
> > I've also added some additional instrumentation to my code and confirmed 
> > that the process is blocking on poll(2) while netstat is reporting 
> > data available on the socket.
> > 
> > What can I do to help track this down?
> > 
> 
> Can the following patch be tested please? It is reported to fix an fio
> regression that may be similar to what you are experiencing but has not
> been picked up yet.

This patch appears to resolve things for me as well, at least after a 
couple of passes.  I'll let you know if I see any further problems come up 
with more testing.

Thanks!
sage


> 
> ---8<---
> From: Alex Shi <alex.shi@...el.com>
> Subject: [PATCH] mm: correct page->pfmemalloc to fix deactivate_slab regression
> 
> commit cfd19c5a9ec (mm: only set page->pfmemalloc when
> ALLOC_NO_WATERMARKS was used) try to narrow down page->pfmemalloc
> setting, but it missed some places the pfmemalloc should be set.
> 
> So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
> cause incorrect deactivate_slab() on our core2 server:
> 
>     64.73%           fio  [kernel.kallsyms]     [k] _raw_spin_lock
>                      |
>                      --- _raw_spin_lock
>                         |
>                         |---0.34%-- deactivate_slab
>                         |          __slab_alloc
>                         |          kmem_cache_alloc
>                         |          |
> 
> That causes our fio sync write performance has 40% regression.
> 
> This patch move the checking in get_page_from_freelist, that resolved
> this issue.
> 
> Signed-off-by: Alex Shi <alex.shi@...el.com>
> ---
>  mm/page_alloc.c |   21 +++++++++++----------
>  1 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 009ac28..07f1924 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1928,6 +1928,17 @@ this_zone_full:
>  		zlc_active = 0;
>  		goto zonelist_scan;
>  	}
> +
> +	if (page)
> +		/*
> +		 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> +		 * necessary to allocate the page. The expectation is
> +		 * that the caller is taking steps that will free more
> +		 * memory. The caller should avoid the page being used
> +		 * for !PFMEMALLOC purposes.
> +		 */
> +		page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> +
>  	return page;
>  }
>  
> @@ -2389,14 +2400,6 @@ rebalance:
>  				zonelist, high_zoneidx, nodemask,
>  				preferred_zone, migratetype);
>  		if (page) {
> -			/*
> -			 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
> -			 * necessary to allocate the page. The expectation is
> -			 * that the caller is taking steps that will free more
> -			 * memory. The caller should avoid the page being used
> -			 * for !PFMEMALLOC purposes.
> -			 */
> -			page->pfmemalloc = true;
>  			goto got_pg;
>  		}
>  	}
> @@ -2569,8 +2572,6 @@ retry_cpuset:
>  		page = __alloc_pages_slowpath(gfp_mask, order,
>  				zonelist, high_zoneidx, nodemask,
>  				preferred_zone, migratetype);
> -	else
> -		page->pfmemalloc = false;
>  
>  	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>  
> -- 
> 1.7.5.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ