lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101124191749.GA29511@hostway.ca>
Date:	Wed, 24 Nov 2010 11:17:49 -0800
From:	Simon Kirby <sim@...tway.ca>
To:	Mel Gorman <mel@....ul.ie>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel <linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Subject: Re: Free memory never fully used, swapping

On Wed, Nov 24, 2010 at 09:27:53AM +0000, Mel Gorman wrote:

> On Tue, Nov 23, 2010 at 10:43:29PM -0800, Simon Kirby wrote:
> > On Tue, Nov 23, 2010 at 10:04:03AM +0000, Mel Gorman wrote:
> > 
> > > On Mon, Nov 22, 2010 at 03:44:19PM -0800, Andrew Morton wrote:
> > > > On Mon, 15 Nov 2010 11:52:46 -0800
> > > > Simon Kirby <sim@...tway.ca> wrote:
> > > > 
> > > > > I noticed that CONFIG_NUMA seems to enable some more complicated
> > > > > reclaiming bits and figured it might help since most stock kernels seem
> > > > > to ship with it now.  This seems to have helped, but it may just be
> > > > > wishful thinking.  We still see this happening, though maybe to a lesser
> > > > > degree.  (The following observations are with CONFIG_NUMA enabled.)
> > > > > 
> > > 
> > > Hi,
> > > 
> > > As this is a NUMA machine, what is the value of
> > > /proc/sys/vm/zone_reclaim_mode ? When enabled, this reclaims memory
> > > local to the node in preference to using remote nodes. For certain
> > > workloads this performs better but for users that expect all of memory
> > > to be used, it has surprising results.
> > > 
> > > If set to 1, try testing with it set to 0 and see if it makes a
> > > difference. Thanks
> > 
> > Hi Mel,
> > 
> > It is set to 0.  It's an Intel EM64T...I only enabled CONFIG_NUMA since
> > it seemed to enable some more complicated handling, and I figured it
> > might help, but it didn't seem to.  It's also required for
> > CONFIG_COMPACTION, but that is still marked experimental.
> > 
> 
> I'm surprised a little that you are bringing compaction up because unless
> there are high-order involved, it wouldn't make a difference. Is there
> a constant source of high-order allocations in the system e.g. a network
> card configured to use jumbo frames? A possible consequence of that is that
> reclaim is kicking in early to free order-[2-4] pages that would prevent 100%
> of memory being used.

We /were/ using jumbo frames, but only over a local cross-over connection
to another node (for DRBD), so I disabled jumbo frames on this interface
and reconnected DRBD.  Even with MTUs set to 1500, we saw GFP_ATOMIC
order=3 allocations coming from __alloc_skb:

perf record --event kmem:mm_page_alloc --filter 'order>=3' -a --call-graph sleep 10
perf trace

	imap-20599 [002] 1287672.803567: mm_page_alloc: page=0xffffea00004536c0 pfn=4536000 order=3 migratetype=0 gfp_flags=GFP_ATOMIC|GFP_NOWARN|GFP_NORETRY|GFP_COMP

perf report shows:

__alloc_pages_nodemask
alloc_pages_current
new_slab
__slab_alloc
__kmalloc_node_track_caller
__alloc_skb
__netdev_alloc_skb
bnx2_poll_work

Dave was seeing these on his laptop with an Intel NIC as well.  Ralf
noted that the slab cache grows in higher order blocks, so this is
normal.  The GFP_ATOMIC bubbles up from *alloc_skb, I guess.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ