linux-kernel - Re: regarding the x86_64 zero-based percpu patches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1y6xfx62a.fsf@frodo.ebiederm.org>
Date:	Mon, 12 Jan 2009 19:01:33 -0800
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Tejun Heo <tj@...nel.org>
Cc:	Christoph Lameter <cl@...ux-foundation.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Ingo Molnar <mingo@...e.hu>, travis@....com,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>, steiner@....com,
	Hugh Dickins <hugh@...itas.com>
Subject: Re: regarding the x86_64 zero-based percpu patches

Tejun Heo <tj@...nel.org> writes:

> Hello, Eric.
>
> Eric W. Biederman wrote:
>>> There are 2M TLB entries on x86_64. If we really get into a high usage
>>> scenario then the 2M entry makes sense. Average server memory sizes likely
>>> already are way beyond 10G per box. The higher that goes the more
>>> reasonable the 2M TLB entry will be.
>> 
>> 2M of per cpu data doesn't make sense, and likely indicates a design
>> flaw somewhere.  It just doesn't make sense to have large amounts of
>> data allocated per cpu.
>
> Why?  On almost all large machines I've seen or heard of, memory size
> scales way better than the number of cpus.  Whether certain usage
> makes sense or not surely is debatable but I can't imagine all use
> cases where 2MB percpu TLB entry could be useful would be senseless.

Right, there are cases where you could hit 2MB but they aren't likely
to be that common.

In particular the common case is to allocate a single word of per
cpu data, with a given allocation request.  To get 2 2MB with
8byte requests requires 262144 different, which is a lot
more than I expect to be common any time soon.

So I figure reserving a 2MB tlb entry is not likely what we want,
in the common case.

>> The most common user of per cpu data I am aware of is allocating one
>> word per cpu for counters.
>> 
>> What would be better is simply to: 
>> - Require a lock to access another cpus per cpu data.
>> - Do large page allocations for the per cpu data.
>> 
>> At which point we could grow the per cpu data by simply reallocating it on
>> each cpu and updating the register that holds the base pointer.
>
> I don't think moving live objects is such a good idea for the
> following reasons.
>
> 1. Programming convenience is usually much more important than people
>    think it is.  Even in the kernel.  I think it's very likely that
>    we'll have unending stream of small feature requirements which
>    would step just outside the supported bounds and ever smart
>    workaround until the restriction is finally removed years later.

> 2. Moving live objects is inherently dangerous + it won't happen
>    often.  Thinking about possible subtle bugs is scary.

But the question is what is per cpu memory.  Per cpu memory is
something we can access quickly without creating cross cpu
cache line contention.

Accessing that memory from other cpus implies we create that
contention and will be the slow path.   We need to do cross
cpu access for the rollup of the statistics, but we clearly
don't want to do it often.

So I expect most times we will want to store a pointer to per
cpu data will be a bug. 

per cpu memory is not something we ever want to use to lightly.
So as long as the rules are clear we should be ok.  And simply
removing the address of function for per cpu data would make it impossible
to point into it.  So I think it is worth a look, to see if we can
move live per cpu data.  As it noticeably simplifies the problem of
growing a per cpu area in the rare case when we need to.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/