[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <496D6300.9070402@kernel.org>
Date: Wed, 14 Jan 2009 12:58:56 +0900
From: Tejun Heo <tj@...nel.org>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Christoph Lameter <cl@...ux-foundation.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Ingo Molnar <mingo@...e.hu>, travis@....com,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>,
Andrew Morton <akpm@...ux-foundation.org>, steiner@....com,
Hugh Dickins <hugh@...itas.com>
Subject: Re: regarding the x86_64 zero-based percpu patches
Hello, Eric.
Eric W. Biederman wrote:
> Tejun Heo <tj@...nel.org> writes:
>> I don't know. I think it's a dangerous thing which can be avoided.
>> If there's no other solution, then we might have to live with it but I
>> don't see the winning benefit of such design over per-cpu virtual
>> mapping.
>
> It isn't incompatible with a per-cpu virtual mapping. It allows the
> possibility of each cpu reusing the same chunk of virtual address
> space for per cpu memory.
>
> On x86_64 and other architectures with enough address space bits it allows
> us to share the large pages that we use for the normal memory mapping with
> the ones for per cpu access.
>
> I definitely think the work of combining the pda and the percpu areas
> into a common area is worthwhile.
Yeah, it's gonna be necessary regardless of which way we go.
> I think it would be nice if the percpu area could grow and would not be
> a fixed size at boot time, I'm not particularly convinced it has to.
The main problem is that the area needs to be congruent which
basically mandates them to be contiguous. The three alternatives on
table are...
1. Just reserve memory from the get-go. Simplest. No additional TLB
pressure but memory is likely to be wasted and more importantly
scalability suffers.
2. Reserve address space and map memory as necessary. We can be much
more generous about reserving address space especially on 64bit
machines and probably can mostly forget about scalability issue
there. However, getting things just right for address space
contrained 32bit might not be too easy but then again nothing
really is scalable on 32bit these days, so we probably can live
with boot time parameter or something.
Another issue is added TLB pressure as it's likely to consume 4K
TLB entries in addition to the default kernel mapping 2M TLB
entries. The TLB pressure can be mostly avoided if percpu area is
sufficiently large to justify 2MB page allocation but it isn't.
3. Do realloc(). This doesn't impose scalability issues or add to TLB
pressure but it does contrain how the percpu variables can be used
and introduces certain amount of possibility for scary
once-in-a-blue-moon never-reproducible bugs. Maybe such
possibility can be reduced by putting some restriction on the
interface but I don't know. It still scares me.
Hmm... IIUC, the biggest drawback of #2 is the added TLB pressure,
right? What if we reserve percpu allocation by 2MB chunks? ie. use
4k mapping but always allocate the percpu pages from aligned 2MB
chunks. That way it won't waste 2MB per cpu and although it will use
additional 4K TLB entries, it will free up 2MB TLB entries.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists