linux-kernel - Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOJsxLGyC4=WwGu7kUTwVKF3AxhfWjBg2sZu=W08RtVMHKk8eQ@mail.gmail.com>
Date:	Mon, 1 Aug 2011 15:45:04 +0300
From:	Pekka Enberg <penberg@...nel.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Christoph Lameter <cl@...ux.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>, hughd@...gle.com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1

Hi David,

On Mon, Aug 1, 2011 at 1:02 PM, David Rientjes <rientjes@...gle.com> wrote:
> Here's the same testing environment with CONFIG_SLUB_STATS for 16 threads
> instead of 160:

[snip]

Looking at the data (in slightly reorganized form):

  alloc
  =====

    16 threads:

      cache           alloc_fastpath          alloc_slowpath
      kmalloc-256     4263275 (91.1%)         417445   (8.9%)
      kmalloc-1024    4636360 (99.1%)         42091    (0.9%)
      kmalloc-4096    2570312 (54.4%)         2155946  (45.6%)

    160 threads:

      cache           alloc_fastpath          alloc_slowpath
      kmalloc-256     10937512 (62.8%)        6490753  (37.2%)
      kmalloc-1024    17121172 (98.3%)        303547   (1.7%)
      kmalloc-4096    5526281  (31.7%)        11910454 (68.3%)

  free
  ====

    16 threads:

      cache           free_fastpath           free_slowpath
      kmalloc-256     210115   (4.5%)         4470604  (95.5%)
      kmalloc-1024    3579699  (76.5%)        1098764  (23.5%)
      kmalloc-4096    67616    (1.4%)         4658678  (98.6%)

    160 threads:
      cache           free_fastpath           free_slowpath
      kmalloc-256     15469    (0.1%)         17412798 (99.9%)
      kmalloc-1024    11604742 (66.6%)        5819973  (33.4%)
      kmalloc-4096    14848    (0.1%)         17421902 (99.9%)

it's pretty sad to see how SLUB alloc fastpath utilization drops so
dramatically. Free fastpath utilization isn't all that great with 160
threads either but it seems to me that most of the performance
regression compared to SLAB still comes from the alloc paths.

I guess the problem here is that __slab_free() happens on a remote CPU
which puts the object to 'struct page' freelist which effectively means
we're unable to recycle free'd objects. As the number of concurrent
threads increase, we simply drain out the fastpath freelists more
quickly. Did I understand the problem correctly?

If that's really happening, I'm still bit puzzled why we're hitting the
slowpath so much. I'd assume that __slab_alloc() would simply reload the
'struct page' freelist once the per-cpu freelist is empty.  Why is that
not happening? I see __slab_alloc() does deactivate_slab() upon
node_match() failure. What kind of ALLOC_NODE_MISMATCH stats are you
seeing?

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/