lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071105194521.5fc0ec71@werewolf>
Date:	Mon, 5 Nov 2007 19:45:21 +0100
From:	"J.A. Magallón" <jamagallon@....com>
To:	linux-kernel@...r.kernel.org
Subject: Re: Opteron box and 4Gb memory

On Mon, 5 Nov 2007 13:10:46 -0500, lsorense@...lub.uwaterloo.ca (Lennart Sorensen) wrote:

> On Mon, Nov 05, 2007 at 12:18:47AM +0100, J.A. Magall?n wrote:
> > Well, I was able to get about 3 Gb with MTRR=discrete in the BIOS,
> > but I'm still in the process to find the 'software hole' option to get
> > the rest of the 4Gb...
> > 
> > But now another (perhaps related) question has arised...
> > I like all those 5-line progams to test system performance...;).
> > I just wrote a simple program that sums/muls int/float vectors with
> > scalar/sse operations. And my opteron box looks terribly slow.
> > 
> > This is my MacPro, Xeon 5130:
> > 
> > belly:~/bn> bn  
> > 	proc: 4 x MacPro1,1 @ 2000 MHz
> > 	ram:  2048 Mb
> > 	os:   unx, Darwin, 9.0.0
> > 	cc:   gcc-4.0.1
> > vector size   : 8 x 1024 x 1024
> > allocation:     0.01 ms
> > int scl add: ..........   36.78 ms,  228.07 Mips   |  114.03 Mips  /GHz
> > int scl mul: ..........   34.30 ms,  244.60 Mips   |  122.30 Mips  /GHz
> > flt scl add: ..........   34.28 ms,  244.73 Mflops |  122.37 Mflops/GHz
> > flt vec add: ..........    7.89 ms, 1063.15 Mflops |  531.58 Mflops/GHz
> > flt scl mul: ..........   34.20 ms,  245.28 Mflops |  122.64 Mflops/GHz
> > flt vec mul: ..........    7.90 ms, 1061.77 Mflops |  530.89 Mflops/GHz
> > total:       3322.19 ms
> > 
> > This is a normal (I think) opteron box (Opteron 846):
> > 
> > selene:~/bn> g  
> > 	proc: 4 x x86_64 @ 2004 MHz
> > 	ram:  3496 Mb
> > 	os:   unx, Linux, 2.6.9-42.0.10.ELsmp
> > 	cc:   gcc-4.0.2
> > vector size   : 8 x 1024 x 1024
> > allocation:     0.05 ms
> > int scl add: ..........   45.98 ms,  182.42 Mips   |   91.03 Mips  /GHz
> > int scl mul: ..........   44.31 ms,  189.30 Mips   |   94.46 Mips  /GHz
> > flt scl add: ..........   44.52 ms,  188.41 Mflops |   94.02 Mflops/GHz
> > flt vec add: ..........   10.03 ms,  836.70 Mflops |  417.52 Mflops/GHz
> > flt scl mul: ..........   43.32 ms,  193.63 Mflops |   96.62 Mflops/GHz
> > flt vec mul: ..........   10.02 ms,  836.98 Mflops |  417.65 Mflops/GHz
> > total:       4705.07 ms
> > 
> > And this is my opteron (Opteron 275)
> > 
> > cicely:~/bn> g  
> > 	proc: 4 x x86_64 @ 2200 MHz
> > 	ram:  2914 Mb
> > 	os:   unx, Linux, 2.6.23.1-desktop-1mdv
> > 	cc:   gcc-4.0.2
> > vector size   : 8 x 1024 x 1024
> > allocation:     0.03 ms
> > int scl add: ..........   87.67 ms,   95.68 Mips   |   43.49 Mips  /GHz
> > int scl mul: ..........   85.48 ms,   98.13 Mips   |   44.61 Mips  /GHz
> > flt scl add: ..........   85.90 ms,   97.66 Mflops |   44.39 Mflops/GHz
> > flt vec add: ..........   19.51 ms,  429.96 Mflops |  195.44 Mflops/GHz
> > flt scl mul: ..........   85.86 ms,   97.70 Mflops |   44.41 Mflops/GHz
> > flt vec mul: ..........   19.50 ms,  430.11 Mflops |  195.50 Mflops/GHz
> > total:       6334.96 ms
> > 
> > As I read in AMD site, the only difference that matters in models is
> > the xx5 vx xx6, related to fequency, but the processors should be just
> > the same.
> > 
> > As this only does intensive memory/fp operations, I'm not going to blame
> > gcc nor kernel versions here (I have compared gcc 3.4, 4.0, 4.1, and 4.2
> > on one of the boxes and results are very similar, the code is really
> > stupid and not very suitable for compiler smartness...).
> > I suspect it is a memory problem. It can be hardware or caused by
> > incorrect BIOS/kernel-mtrr setup:
> > 
> > selene:~> cat /proc/mtrr
> > reg00: base=0x00000000 (   0MB), size=16384MB: write-back, count=1
> > reg01: base=0xf0000000 (3840MB), size= 256MB: uncachable, count=1
> > 
> > cicely:~> cat /proc/mtrr
> > reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
> > reg01: base=0x80000000 (2048MB), size= 512MB: write-back, count=1
> > reg02: base=0xa0000000 (2560MB), size= 256MB: write-back, count=1
> > reg03: base=0xb0000000 (2816MB), size= 128MB: write-back, count=1
> > reg04: base=0xb8000000 (2944MB), size=  16MB: write-back, count=1
> > 
> > 
> > Any idea on what can be going on here ? I have asked the 'good opteron'
> > admin info about the mobo an memory of the box.
> > 
> > Any help will be _very_ appreciated.
> 
> Well what revisions are the two opterons?  Is one running dual channel
> memory while the other isn't perhaps?  What speed and type is the ram on
> the two opterons?
> 

Well, problem solved...

I'm going to kill all pc assemblers in the world... Someone should teach them
to learn mauals before assembling anything but a power chord.

The memory was not paired, so the motherboard was not interleaving the access.
With no inter-node but with inter-module interleaving, and a couple 1Gb sticks
for each processor now I get something like:

cicely:~/bn> bn
	name: cicely.cps.unizar.es
	arch: x86-64
	proc: 4 x x86_64 @ 2200 MHz
	ram:  3555 Mb
	os:   unx, Linux, 2.6.23.1-desktop-1mdv
	cc:   gcc-4.3.0
vector size   : 8 x 1024 x 1024
allocation:     0.02 ms
int scl add: ..........   60.56 ms,  138.52 Mips   |   62.96 Mips  /GHz
int scl mul: ..........   59.34 ms,  141.36 Mips   |   64.26 Mips  /GHz
flt scl add: ..........   59.01 ms,  142.16 Mflops |   64.62 Mflops/GHz
flt vec add: ..........   14.79 ms,  567.06 Mflops |  257.75 Mflops/GHz
flt scl mul: ..........   59.02 ms,  142.12 Mflops |   64.60 Mflops/GHz
flt vec mul: ..........   14.82 ms,  566.19 Mflops |  257.36 Mflops/GHz
total:       5019.86 ms

Much better, but not like the other opteron box.

My processors are higher than Rev E0, because the BIOS does not let me choose
the 'software' hole. If I activate the 'hardware hole', I see al the memory
I can:

cicely:~/bn> free
             total       used       free     shared    buffers     cached
Mem:       3640628     214496    3426132          0      21240      84184
-/+ buffers/cache:     109072    3531556
Swap:      4200988          0    4200988

3.64 Gb. The rest is eaten by the graphics card, as I could read in the
AMD site. Don't know if mem=4096 to boot the kernel would help, even if it
is possible (don't think so, as it looks like a BIOS mis-feature).
The ram is DDR 400.

Anyways, can I trust what dmidecode says ? I installed the ram as the board
manual said in banks 1A+1B (not 2A+2B) for each processor, but this program
says this:

BANK0   64Mb            BANK4   64Mb
BANK1   64Mb            BANK5   64Mb
BANK2 1024Mb            BANK6 1024Mb
BANK3 1024Mb            BANK7 1024Mb

I would always have thought that BANK0 would be slot 1A in first processor,
but it looks like not...
And where do the 64 Mb blocks come from ?

Really strange...

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam01 (gcc 4.2.2 20070909 (4.2.2-0.RC.1mdv2008.0)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ