[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0710041917020.14135@schroedinger.engr.sgi.com>
Date: Thu, 4 Oct 2007 19:43:58 -0700 (PDT)
From: Christoph Lameter <clameter@....com>
To: Matthew Wilcox <matthew@....cx>
cc: David Miller <davem@...emloft.net>, willy@...ux.intel.com,
nickpiggin@...oo.com.au, hch@....de, mel@...net.ie,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
dgc@....com, jens.axboe@...cle.com, suresh.b.siddha@...el.com
Subject: Re: SLUB performance regression vs SLAB
I just spend some time looking at the functions that you see high in the
list. The trouble is that I have to speculate and that I have nothing to
verify my thoughts. If you could give me the hitlist for each of the
3 runs then this would help to check my thinking. I could be totally off
here.
It seems that we miss the per cpu slab frequently on slab_free() which
leads to the calling of __slab_free() and which in turn needs to take a
lock on the page (in the page struct). Typically the page lock is
uncontended which seems to not be the case here otherwise it would not be
that high up.
The per cpu patch in mm should reduce the contention on the page struct by
not touching the page struct on alloc and on free. Does not seem to work
all the way though. slab_free() still has to touch the page struct if the
free is not to the currently active cpu slab.
So there could still be page struct contention left if multiple processors
frequently and simultaneously free to the same slab and that slab is not
the per cpu slab of a cpu. That could be addressed by optimizing the
object free handling further to not touch the page struct even if we miss
the per cpu slab.
That get_partial* is far up indicates contention on the list lock that
should be addressable by either increasing the slab size or by changing
the object free handling to batch in some form.
This is an SMP system right? 2 cores with 4 cpus each? The main loop is
always hitting on the same slabs? Which slabs would this be? Am I right in
thinking that one process allocates objects and then lets multiple other
processors do work and then the allocated object is freed from a cpu that
did not allocate the object? If neighboring objects in one slab are
allocated on one cpu and then are almost simultaneously freed from a set
of different cpus then this may be explain the situation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists