linux-kernel - Re: regarding the x86_64 zero-based percpu patches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <496D6300.9070402@kernel.org>
Date:	Wed, 14 Jan 2009 12:58:56 +0900
From:	Tejun Heo <tj@...nel.org>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	Christoph Lameter <cl@...ux-foundation.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Ingo Molnar <mingo@...e.hu>, travis@....com,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>, steiner@....com,
	Hugh Dickins <hugh@...itas.com>
Subject: Re: regarding the x86_64 zero-based percpu patches

Hello, Eric.

Eric W. Biederman wrote:
> Tejun Heo <tj@...nel.org> writes:
>> I don't know.  I think it's a dangerous thing which can be avoided.
>> If there's no other solution, then we might have to live with it but I
>> don't see the winning benefit of such design over per-cpu virtual
>> mapping.
> 
> It isn't incompatible with a per-cpu virtual mapping.  It allows the
> possibility of each cpu reusing the same chunk of virtual address
> space for per cpu memory.
> 
> On x86_64 and other architectures with enough address space bits it allows
> us to share the large pages that we use for the normal memory mapping with
> the ones for per cpu access.
> 
> I definitely think the work of combining the pda and the percpu areas
> into a common area is worthwhile.

Yeah, it's gonna be necessary regardless of which way we go.

> I think it would be nice if the percpu area could grow and would not be
> a fixed size at boot time, I'm not particularly convinced it has to.

The main problem is that the area needs to be congruent which
basically mandates them to be contiguous.  The three alternatives on
table are...

1. Just reserve memory from the get-go.  Simplest.  No additional TLB
   pressure but memory is likely to be wasted and more importantly
   scalability suffers.

2. Reserve address space and map memory as necessary.  We can be much
   more generous about reserving address space especially on 64bit
   machines and probably can mostly forget about scalability issue
   there.  However, getting things just right for address space
   contrained 32bit might not be too easy but then again nothing
   really is scalable on 32bit these days, so we probably can live
   with boot time parameter or something.

   Another issue is added TLB pressure as it's likely to consume 4K
   TLB entries in addition to the default kernel mapping 2M TLB
   entries.  The TLB pressure can be mostly avoided if percpu area is
   sufficiently large to justify 2MB page allocation but it isn't.

3. Do realloc().  This doesn't impose scalability issues or add to TLB
   pressure but it does contrain how the percpu variables can be used
   and introduces certain amount of possibility for scary
   once-in-a-blue-moon never-reproducible bugs.  Maybe such
   possibility can be reduced by putting some restriction on the
   interface but I don't know.  It still scares me.

Hmm... IIUC, the biggest drawback of #2 is the added TLB pressure,
right?  What if we reserve percpu allocation by 2MB chunks?  ie. use
4k mapping but always allocate the percpu pages from aligned 2MB
chunks.  That way it won't waste 2MB per cpu and although it will use
additional 4K TLB entries, it will free up 2MB TLB entries.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/