lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131015131613.GD3141@htj.dyndns.org>
Date:	Tue, 15 Oct 2013 09:16:13 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Zhang Yanfei <zhangyanfei.yes@...il.com>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>,
	"H. Peter Anvin" <hpa@...or.com>, Toshi Kani <toshi.kani@...com>,
	Ingo Molnar <mingo@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH part2 v2 0/8] Arrange hotpluggable memory as ZONE_MOVABLE

Hello, Yinghai.

On Mon, Oct 14, 2013 at 07:25:55PM -0700, Yinghai Lu wrote:
> > Wouldn't that amount be fairly static and restricted?  If you wanna
> > chunk memory init anyway, there's no reason to init more than
> > necessary until smp stage is reached.  The more you do early, the more
> > serialized you're, so wouldn't the goal naturally be initing the
> > minimum possible?
> 
> Even we try to go minimum range instead of range that whole range on boot node,
> without parsing srat at first, the minimum range could be crossed the boundary
> of nodes.

I guess it depends on how much is the minimum we're talking about, but
let's say it isn't multiple orders of magnitude larger than the kernel
image.  That shouldn't be a problem then, no?

The thing is I don't really see how SRAT would help much.  I don't
know how the existing systems are configured but it's natural to
assume that hardware-wise per-stick removal will be supported, right?
There's no reason for memory sticks of the first numa node can't be
hotunplugged.  Likely we'll end up with SRAT map which splits the
first node into two pieces - the first smaller part which can't be
removed because firmwares and stuff depend on them and the larger
tailing chunk which can be removed.  Allocating early non-migratable
stuff near the kernel image, which can't be moved without an
additional layer of indirection anyway would be fairly good choice
regardless, right?

Even if we parse SRAT early, we can't unconditionally make the kernel
allocate early stuff from node0.  We do not know how SRAT will look
like in future configurations.  If what the hotplug people are saying
is true, the first non-hotpluggable node being relatively small seems
actually quite likely.  I don't think we want to factor all those
variables into very early bootstrap stages and it's not like we're
talking about gigabytes of memory.  e.g. bring up the first half or
one gig and go from there.  That part of memory is highly unlikely to
be unpluggable anyway.

> > * 4k page mappings.  It'd be nice to keep everything working for 4k
> >   but just following SRAT isn't enough.  What if the non-hotpluggable
> >   boot node doesn't stretch high enough and page table reaches down
> >   too far?  This won't be an optional behavior, so it is actually
> >   *likely* to happen on certain setups.
> 
> no, do not assume 4k page. even we are using 1GB mapping,  we will still have
> chance to have one node to take 512G RAM, that means we can have one 4k page
> on local node ram.

Sure, the kernel image can also be located such that the last page
spills over to the next node too.  No matter what we do, without an
extra layer of indirection, this can't be a complete solution.  Think
about the usual node configuration and where kernel image is usually
loaded.  As long as page table is relatively small, it is highly
unlikely to increase the chance of such issues.

Again, it's all about benefit and cost.  Sure, parsing SRAT early will
definitely decrease the chance of such issues.  However, as long as
the size of page table is small enough, just allocating those on top
of the kernel isn't significantly worse.  Also, following SRAT earlier
not only increases complexity in vulnerable stages of boot but also
carries higher risk with the existing and future configurations
depending on how their SRAT looks like if the new behavior is applied
unconditionally.  If we decide to make early SRAT usage conditional,
that a *LOT* more conditional code than what's added by bottom-up
allocation.

> On x86 system with intel new cpus there is memory controller built-in.,
> could have hotplug modules (with socket and memory) and those hotplug modules
> will be serviced as one single point. Just nowadays like we have pcie
> card hotplugable.
> 
> I don't see where is the " a clear performance trade-off".

Because kernel data structures have to be allocated off-node.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ