lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AD52E37.80209@kernel.org>
Date:	Wed, 14 Oct 2009 10:49:43 +0900
From:	Tejun Heo <tj@...nel.org>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	linuxppc-dev@...ts.ozlabs.org
Subject: Re: New percpu & ppc64 perfs

Hello, Benjamin.

Benjamin Herrenschmidt wrote:
> So I found (and fixed, though the patch isn't upstream yet) the problem
> that was causing the new percpu to hang when accessing the top of our
> vmalloc space.
> 
> However, I have some concerns about that choice of location for the
> percpu datas.
> 
> Basically, our MMU divides the address space into "segments" (of 256M or
> 1T depending on your processor capabilities) and those segments are SW
> loaded into a relatively small (64 entries) SLB buffer.
> 
> Thus, by moving the per-cpu to the end of the vmalloc space, you
> essentially make it use a different segment from the rest of the vmalloc
> space, which will overall degrade performances by increasing pressure on
> the SLB.
> 
> It would be nicer if we could provide an arch function to provide a
> "preferred" location for the per-cpu data.
> 
> I can easily cook up a patch but wanted to discuss that with you first.
> Any reason why we would keep it within vmalloc space for example ? IE. I
> could move VMALLOC_END to below the per-cpu reserved areas, or are they
> subject to expansion past boot time ?
> 
> Also, how big can they be ? Ie, will the top of the first 256M segment
> good enough or that will risk blowing out of space ? In general,
> machines with 256M segments won't have more than 64 or maybe 128 CPUs I
> believe. Bigger machines will have CPUs that support 1T segments.

Hmm... I don't think 256M segment will be enough.  Percpu area layout
will follow how numa memory is laidd out.  For example, if a machine
has 4 nodes (each one with one cpu) and memory for each node is 1G in
size and 1G apart, the first chunk will be embedded in the linear
mapping area (normal kernel addressable area) and each unit in the
chunk will be apart by between 1G and 2G.  As the first chunk is
embedded in the linear mapped area, this shouldn't cause any extra
overhead.

The vmalloc area is used when the first chunk is filled and another
chunk need to be allocated.  From the second chunk on, vmalloc area is
used to preserve the layout of the first chunk.  ie. Each of them will
span across 8G bytes (they will overlap tho, so even with many dynamic
chunks vm usage will only be slightly over 8G).

The reason why vmalloc area from the top is used is that I didn't want
this congruent allocation to compete with normal vmalloc allocations.
Depending on the numa layout, competition between linear allocation
and congruent allocation may create many unnecessary holes.

For 256M segment, I don't think much can be done but for 1T segment,
just limiting vmalloc area size to 1T should do the trick, no?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ