lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <46D00DDD.1070101@shadowen.org>
Date:	Sat, 25 Aug 2007 12:09:17 +0100
From:	Andy Whitcroft <apw@...dowen.org>
To:	Andi Kleen <ak@...e.de>
CC:	Mel Gorman <mel@...net.ie>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86 Boot NUMA kernels on non-NUMA hardware with DISCONTIG
 memory model

Andi Kleen wrote:
> On Fri, Aug 24, 2007 at 06:44:38PM +0100, Mel Gorman wrote:
>> On (24/08/07 19:38), Andi Kleen didst pronounce:
>>>> Other than the fact that the memmap must be PMD aligned to use hugepage
>>>> entries for the memmap. 
>>> Why is that so?  mem_map should be just part of lowmem anyways.
>>>
>> Not in this case. memmap is allocated node local and mapped in the virtual
>> memory area normally occupied by the end of low memory. The objective was
>> to have memmap for the struct pages node-local. Hence, portions of
>> memmap are really in highmem.
> 
> Ok, but that still doesn't mean it has to be PMD aligned, 
> as long as illegal virtual aliases are prevent in the overlap
> (which is not very hard) 
> 
>>>> It could be mapped with small pages in corner cases
>>>> but the complexity worth it?
>>> You don't need to map it with small pages in the normal case,
>>> the only requirement is that c_p_a() is aware of it so it can
>>> split it if needed.
>>>
>>>> I can't see this type of lifting being done any time soon. As SPARSEMEM works
>>>> and there is hope with the vmemmap work that DISCONTIG will finally go away,
>>>> it may not be the best investment of time.
>>> It's a trivial change, probably less code than your original patch.
>>>
>> I'll have to take your word for it because I haven't looked closely
>> enough. I'll try and find time to look at it but the earliest I'll get around
>> to it is post kernel-summit. In the meantime, SPARSEMEM works.
> 
> Ok, so we disable DISCONTIG i386 NUMA because there's nobody willing
> to maintain it?
>
> I'll take your word SPARSEMEM works, although I was told DISCONTIG NUMA
> works too and then my testing told a quite different story.

That sounds like over kill to me.  The code unfixed works for all actual
NUMA systems I am aware of, else we would have had reports of this
problem before in the years that this code has been in the kernel.  The
fix Mel sent up fixes the code so that it works on systems with
unaligned node ends (which is what triggers the issue).  It does mean
that a little memory is wasted when this kernel is used on a non-NUMA
systems with unaligned node ends (only), but it works as designed at
that point.  To be honest it looks very much that only a very small
memory systems is going to trip this, and we have traditionally used
non-NUMA kernels on non-NUMA systems so there is almost zero exposure in
our install base.

Does this sudden interest in this combination, indicate a distro driven
change to using NUMA kernels on non-NUMA systems??

Having been involved in the development of the code originally, I think
Mel's fix is a good compromise to fix the immediate problem.  Clearly
there are bigger problems in this code that need clearing up if we are
to use this code as it is on small memory non-NUMA systems.  For one the
change merged to fix the "memmap overlapping initrd allocation" severely
wastes memory by pushing the memmap into ZONE_NORMAL even when there is
spare Kernel Virtual Address space available, and also looses the memory
under it where it used to shift to HIGHMEM.

I think that most of this can become moot if we simply pull node-0 out
of this remap scheme, as node-0's memory is already local and the
problem only occurs on node-0.  I have a todo item to look over this,
but as Mel has indicated its probabally not going to be immediate.

I think it makes sense to take Mel's fix as the smallest repair and
we'll spend some time sorting it out cleanly soon.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ