[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4A27C7F8.3010207@itwm.fraunhofer.de>
Date: Thu, 04 Jun 2009 15:11:20 +0200
From: Martin Vogt <vogt@...m.fraunhofer.de>
To: linux-kernel@...r.kernel.org
Subject: NUMA regression(?) on 32core shanghai
Hello,
I have strange/unexpected benchmark results for my numa machine
a 32 cores shanghai system with 512GB RAM.
My benchmark shows varying runtimes up to factor 12(!) for identical
tests and I think this is a bug somewhere.
I have tested the following kernels:
-2.6.30-rc8,2.6.29.4 and SLES10-SP1 kernel
All have the same problems for 16/32 threads in the first run.
(but not always!)
For example 2.6.30-rc8:
16-1: 33.403038s 28.906326s <<-- strange values
16-2: 5.444921s 5.072422s
16-3: 6.266797s 6.152743s
This is why I think this is a bug:
----------------------------------
My understanding of the NUMA memory bandwitdh test is:
- if I attach 8 threads to one numa node
- and allocate for each thread 512MB local memory
THEN:
- the runtime should be near constant over all nodes for all runs
(for example: every thread runs 3 seconds)
If I now double the threads (16 threads, 2 on each numa node)
then:
- the the runtime should double too.
(for example: 6 seconds instead of three)
and so on, for 32 threads 12 seconds etc...
The machine behaves sometimes as expected, but for the
16/32 threads case it usually has these strange runtimes in the first run.
(But this can happen for the 8 thread test too)
What is wrong with this?
(a factor of 12 slower for old kernels, and factor ~4 for newer)
There must be something wrong with this.
How can I debug it?
regards,
Martin
PS: on a smaller opteron numa system 4 nodes a 2 cores with
8GB on each node the test program works as expected.
PPS: the "bug" does not happens always, but very often with 16/32 threads
and: the behaviour is the same if I replace numa_alloc_onnode with malloc
Benchmark:
- cron is off/HZ is 100/libc 2.4-31.43.7 from SLES10
- Format example:
08-1: 3.405676 3.023264
8 threads, first run, read took 3.4 seconds and write 3.0 secs.
2.6.30-rc8
=====================
04-1: 3.591044 3.295444
04-2: 3.588437 3.280143
04-3: 3.448116 2.995627
08-1: 4.122432 3.566830
08-2: 4.119241 3.548015
08-3: 3.819517 3.349197
16-1: 33.403038 28.906326 <<-- strange values
16-2: 5.444921 5.072422
16-3: 6.266797 6.152743
32-1: 49.885150 76.500259 <<-- strange values
32-2: 19.114738 12.170802
32-3: 14.807441 11.064564
2.6.29.4
==================
04-1: 3.375012 3.057332
04-2: 3.401835 3.039497
04-3: 3.359395 2.980974
08-1: 3.405676 3.023264
08-2: 3.257743 3.000751
08-3: 3.129684 2.886261
16-1: 22.417126 11.807065 <<-- strange values
16-2: 6.031583 5.098305
16-3: 5.088144 5.457238
32-1: 45.829553 24.225427 <<-- strange values
32-2: 13.165044 12.290732
32-3: 8.908012 11.622502
2.6.16 (SuSE SLES10-SP1)+perfctr
================================
(Seconds: it was take the slowest thread)
#Thread-run read in secs write in secs
04-1: 3.375012 3.057332
04-2: 3.401835 3.039497
04-3: 3.359395 2.980974
08-1: 3.405676 3.023264
08-2: 3.257743 3.000751
08-3: 3.129684 2.886261
16-1: 74.399871 12.747340 <<-- strange values
16-2: 7.449596 4.401576
16-3: 6.123250 5.518968
32-1: 150.927981 55.032012 <<-- strange values
32-2: 12.119996 12.203303
32-3: 11.601377 12.485716
Download attachment "mbind2.cpp.gz" of type "application/x-gzip" (1779 bytes)
Powered by blists - more mailing lists