lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090112223220.GK23848@one.firstfloor.org>
Date:	Mon, 12 Jan 2009 23:32:20 +0100
From:	Andi Kleen <andi@...stfloor.org>
To:	Frederik Deweerdt <frederik.deweerdt@...og.eu>
Cc:	Andi Kleen <andi@...stfloor.org>, mingo@...e.hu,
	tglx@...utronix.de, hpa@...or.com, linux-kernel@...r.kernel.org
Subject: Re: [patch] tlb flush_data: replace per_cpu with an array

On Mon, Jan 12, 2009 at 11:10:23PM +0100, Frederik Deweerdt wrote:
> On Mon, Jan 12, 2009 at 10:57:02PM +0100, Andi Kleen wrote:
> > On Mon, Jan 12, 2009 at 10:35:42PM +0100, Frederik Deweerdt wrote:
> > > Hi,
> > > 
> > > On x86_64 flush tlb data is stored in per_cpu variables. This is
> > > unnecessary because only the first NUM_INVALIDATE_TLB_VECTORS entries
> > > are accessed.
> > > This patch aims at making the code less confusing (there's nothing
> > > really "per_cpu") by using a plain array. It also would save some memory
> > > on most distros out there (Ubuntu x86_64 has NR_CPUS=64 by default).
> > 
> > Nope it doesn't save memory on most systems because per cpu is only allocated
> > based on the CPUs that are actually there. And if you have more than 8
> > cores you can likely afford a few bytes per CPU.
> I did not understand that, thanks for clarifiying
> > 
> > You would need to cache line pad each entry then, otherwise you risk
> > false sharing. That would make the array 1K on 128 bytes cache line 
> > system.  This means on small systems this would actually waste
> > much more memory.
> > 
> > per cpu avoids that problem completely.
> It is also slower (or so percpu.h says), and confusing I'd say.

Well it's something like 3 instructions versus one. You would
have a hard time benchmarking it unless you run it in a very tight 
loop. It will be lost in the noise compared to all the other costs
of the IPI.

Also why i don't like this patch is that on the typical small single/dual
core system running a 128 byte cache line distro kernel you always pay the 
1K cost now, while with per cpu it only needed one/two entries.

Admittedly it could have been better commented.

Not that it matters now unfortunately it's already applied. Sometimes
wonder why I still bother to do patch review...

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ