linux-kernel - Re: regarding the x86_64 zero-based percpu patches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1k58zx305.fsf@frodo.ebiederm.org>
Date:	Mon, 12 Jan 2009 20:07:38 -0800
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Tejun Heo <tj@...nel.org>
Cc:	Christoph Lameter <cl@...ux-foundation.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Ingo Molnar <mingo@...e.hu>, travis@....com,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>, steiner@....com,
	Hugh Dickins <hugh@...itas.com>
Subject: Re: regarding the x86_64 zero-based percpu patches

Tejun Heo <tj@...nel.org> writes:

> Hello, Eric.
>
> Eric W. Biederman wrote:
>> Right, there are cases where you could hit 2MB but they aren't likely
>> to be that common.
>> 
>> In particular the common case is to allocate a single word of per
>> cpu data, with a given allocation request.  To get 2 2MB with
>> 8byte requests requires 262144 different, which is a lot
>> more than I expect to be common any time soon.
>> 
>> So I figure reserving a 2MB tlb entry is not likely what we want,
>> in the common case.
>
> Yeap, it probably won't hit 2MB in common cases but it still needs to
> scale for uncommon cases and if 4K TLB pressure becomes too high for
> such cases, promoting to 2MB TLB makes sense for those.  IIUC, that's
> what Christoph Lameter is intending to do (haven't looked at the code
> yet tho).
>
>>>> The most common user of per cpu data I am aware of is allocating one
>>>> word per cpu for counters.
>>>>
>>>> What would be better is simply to: 
>>>> - Require a lock to access another cpus per cpu data.
>>>> - Do large page allocations for the per cpu data.
>>>>
>>>> At which point we could grow the per cpu data by simply reallocating it on
>>>> each cpu and updating the register that holds the base pointer.
>>> I don't think moving live objects is such a good idea for the
>>> following reasons.
>>>
>>> 1. Programming convenience is usually much more important than people
>>>    think it is.  Even in the kernel.  I think it's very likely that
>>>    we'll have unending stream of small feature requirements which
>>>    would step just outside the supported bounds and ever smart
>>>    workaround until the restriction is finally removed years later.
>> 
>>> 2. Moving live objects is inherently dangerous + it won't happen
>>>    often.  Thinking about possible subtle bugs is scary.
>> 
>> But the question is what is per cpu memory.  Per cpu memory is
>> something we can access quickly without creating cross cpu
>> cache line contention.
>> 
>> Accessing that memory from other cpus implies we create that
>> contention and will be the slow path.   We need to do cross
>> cpu access for the rollup of the statistics, but we clearly
>> don't want to do it often.
>> 
>> So I expect most times we will want to store a pointer to per
>> cpu data will be a bug. 
>> 
>> per cpu memory is not something we ever want to use to lightly.
>> So as long as the rules are clear we should be ok.  And simply
>> removing the address of function for per cpu data would make it impossible
>> to point into it.  So I think it is worth a look, to see if we can
>> move live per cpu data.  As it noticeably simplifies the problem of
>> growing a per cpu area in the rare case when we need to.
>
> I don't know.  I think it's a dangerous thing which can be avoided.
> If there's no other solution, then we might have to live with it but I
> don't see the winning benefit of such design over per-cpu virtual
> mapping.

It isn't incompatible with a per-cpu virtual mapping.  It allows the
possibility of each cpu reusing the same chunk of virtual address
space for per cpu memory.

On x86_64 and other architectures with enough address space bits it allows
us to share the large pages that we use for the normal memory mapping with
the ones for per cpu access.

I definitely think the work of combining the pda and the percpu areas
into a common area is worthwhile.

I think it would be nice if the percpu area could grow and would not be
a fixed size at boot time, I'm not particularly convinced it has to.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/