lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110629123409.GL3386@htj.dyndns.org>
Date:	Wed, 29 Jun 2011 14:34:09 +0200
From:	Tejun Heo <tj@...nel.org>
To:	Hans Rosenfeld <hans.rosenfeld@....com>
Cc:	Conny Seidel <conny.seidel@....com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: 32bit NUMA and fakeNUMA broken for AMD CPUs

Hello, again.

I think I found what went wrong.

> > [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000238000000
> > [    0.000000] Node 1 MemBase 0000000238000000 Limit 0000000638000000
> > [    0.000000] Node 2 MemBase 0000000638000000 Limit 0000000838000000
> > [    0.000000] Node 3 MemBase 0000000838000000 Limit 0000000c38000000
> > [    0.000000] Node 4 MemBase 0000000c38000000 Limit 0000000e38000000
> > [    0.000000] Node 5 MemBase 0000000e38000000 Limit 0000001000000000
> > [    0.000000] Node 6 bogus settings 1238000000-1000000000.
> > [    0.000000] Node 7 bogus settings 1438000000-1000000000.

NUMA nodes are aligned to 27bit - 128MiB.  SPARSEMEM is enabled but on
x86-32 w/ PAE SECTION_SIZE_BITS is 29 - 512MiB, which means that pages
living near the boundary will have wrong nid assigned to them.

> > [    0.000000] BUG: Int 6: CR2   (null)
> > [    0.000000]      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
> > [    0.000000]      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
> > [    0.000000]      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
> > [    0.000000] Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
> > [    0.000000]          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
> > [    0.000000]        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
> > [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
> > [    0.000000] Call Trace:
> > [    0.000000]  [<c136b1e5>] ? early_fault+0x2e/0x2e
> > [    0.000000]  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42

So, mminit_verify_page_links() detects it while the last 512MiB
highmem chunk of node 0 is being initialized and freaks out.

We definitely need a safe guard to check NUMA node alignment and
disable NUMA if it requires finer granuality than supported by the
memory model.  If you use DISCONTIGMEM, which has 64MiB granuality,
instead, it works, right?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ