linux-kernel - Re: Memory issues with Opteron 6220

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 9 Feb 2012 09:33:15 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	linux-kernel@...r.kernel.org, jk@...ozymes.com
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Yinghai Lu <yinghai@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, Tejun Heo <tj@...nel.org>
Subject: Re: Memory issues with Opteron 6220


* Anders Ossowicki <aowi@...ozymes.com> wrote:

> Hey,
> 
> We're seeing unexpected slowdowns and other memory issues with a new system.
> Enough to render it unusable. For example:
> 
> Error: open3: fork failed: Cannot allocate memory
> 
> at times where there's no real memory pressure:
>                    total       used       free     shared    buffers     cached
>       Mem:     132270720  131942388     328332          0     299768  103334420
>       -/+ buffers/cache:   28308200  103962520
>       Swap:      7811068      13760    7797308
>
> [...]

> The system is a Dell Poweredge R715, with two eight-core 
> Opteron 6220 processors and 128G of memory. We have several 
> similar systems, such as the one this should replace: R715, 
> 2x8 core Opteron 6140, 128G memory, and they do not exhibit 
> any similar symptoms.

130 MB of RAM visible to Linux isn't the expected bootup default 
indeed. Around 130 *GB* would be expected ...

> We have tried with 2.6.37, 2.6.38, 3.2.5 and 3.3-rc1 with no luck. The
> microcode updates from AMD have not helped either.

Nasty.

No smoking gun in the dmesg:

> dmesg is available at http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5.txt

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000df679000 (usable)
[    0.000000]  BIOS-e820: 00000000df679000 - 00000000df68f000 (reserved)
[    0.000000]  BIOS-e820: 00000000df68f000 - 00000000df6ce000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000df6ce000 - 00000000e0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fe000000 - 00000000fec90000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec94000 - 00000000fecd0000 (reserved)
[    0.000000]  BIOS-e820: 00000000fecd4000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 000000201f000000 (usable)

that 0x201f000000 is slightly above 128 GB.

The lowlevel x86 RAM init code seems to be fine:

[    0.000000] last_pfn = 0x201f000 max_arch_pfn = 0x400000000

that 0x201f000 correctly points to slighly above 128 GB 
physical.

[    0.000000] init_memory_mapping: 0000000100000000-000000201f000000

that too shows that the lowlevel x86 platform memory init code 
still sees 128 GB.

it's spread out amongst 4 nodes, 32 GB each:

[    0.000000] Initmem setup node 0 0000000000000000-0000000820000000
[    0.000000]   NODE_DATA [000000081fffb000 - 000000081fffffff]
[    0.000000] Initmem setup node 1 0000000820000000-0000001020000000
[    0.000000]   NODE_DATA [000000101fffb000 - 000000101fffffff]
[    0.000000] Initmem setup node 2 0000001020000000-0000001820000000
[    0.000000]   NODE_DATA [000000181fffb000 - 000000181fffffff]
[    0.000000] Initmem setup node 3 0000001820000000-000000201f000000
[    0.000000]   NODE_DATA [000000201effa000 - 000000201effefff]

the NORMAL zone gets set up properly:

[    0.000000]   Normal   0x00100000 -> 0x0201f000

and each node zone got 32 GB of RAM:

[    0.000000]   Normal zone: 7354368 pages, LIFO batch:31
[    0.000000]   Normal zone: 8257536 pages, LIFO batch:31
[    0.000000]   Normal zone: 8257536 pages, LIFO batch:31
[    0.000000]   Normal zone: 8253504 pages, LIFO batch:31


and it's all visible in the end to the MM:

[    0.000000] Built 4 zonelists in Zone order, mobility grouping on.  Total pages: 33021506

that's still 125 GB. (cgroup_page appears to pick up 1GB of RAM 
btw.)

So where is the rest of RAM gone? How does /proc/meminfo look 
like?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/