lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Jan 2009 13:00:16 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Ravikiran G Thirumalai <kiran@...lex86.org>,
	Frederik Deweerdt <frederik.deweerdt@...og.eu>,
	andi@...stfloor.org, tglx@...utronix.de, hpa@...or.com,
	linux-kernel@...r.kernel.org
Subject: Re: [patch] tlb flush_data: replace per_cpu with an array

On Tue, 2009-01-13 at 00:00 +0100, Ingo Molnar wrote:

> From 23d9dc8bffc759c131b09a48b5215cc2b37a5ac3 Mon Sep 17 00:00:00 2001
> From: Frederik Deweerdt <frederik.deweerdt@...og.eu>
> Date: Mon, 12 Jan 2009 22:35:42 +0100
> Subject: [PATCH] x86, tlb flush_data: replace per_cpu with an array
> 
> Impact: micro-optimization, memory reduction
> 
> On x86_64 flush tlb data is stored in per_cpu variables. This is
> unnecessary because only the first NUM_INVALIDATE_TLB_VECTORS entries
> are accessed.
> 
> This patch aims at making the code less confusing (there's nothing
> really "per_cpu") by using a plain array. It also would save some memory
> on most distros out there (Ubuntu x86_64 has NR_CPUS=64 by default).
> 
> [ Ravikiran G Thirumalai also pointed out that the correct alignment
>   is ____cacheline_internodealigned_in_smp, so that there's no
>   bouncing on vsmp. ]
> 
> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@...og.eu>
> Signed-off-by: Ingo Molnar <mingo@...e.hu>
> ---
>  arch/x86/kernel/tlb_64.c |   16 ++++++++--------
>  1 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
> index f8be6f1..c5a6c6f 100644
> --- a/arch/x86/kernel/tlb_64.c
> +++ b/arch/x86/kernel/tlb_64.c
> @@ -33,7 +33,7 @@
>   *	To avoid global state use 8 different call vectors.
>   *	Each CPU uses a specific vector to trigger flushes on other
>   *	CPUs. Depending on the received vector the target CPUs look into
> - *	the right per cpu variable for the flush data.
> + *	the right array slot for the flush data.
>   *
>   *	With more than 8 CPUs they are hashed to the 8 available
>   *	vectors. The limited global vector space forces us to this right now.
> @@ -48,13 +48,13 @@ union smp_flush_state {
>  		unsigned long flush_va;
>  		spinlock_t tlbstate_lock;
>  	};
> -	char pad[SMP_CACHE_BYTES];
> -} ____cacheline_aligned;
> +	char pad[X86_INTERNODE_CACHE_BYTES];
> +} ____cacheline_internodealigned_in_smp;

That will make the below array 8*4096 bytes for VSMP, which pushes the
limit for memory savings up to 256 cpus.

I'm really dubious this patch is really worth it.

>  /* State is put into the per CPU data section, but padded
>     to a full cache line because other CPUs can access it and we don't
>     want false sharing in the per cpu data segment. */
> -static DEFINE_PER_CPU(union smp_flush_state, flush_state);
> +static union smp_flush_state flush_state[NUM_INVALIDATE_TLB_VECTORS];
>  
>  /*
>   * We cannot call mmdrop() because we are in interrupt context,
> @@ -129,7 +129,7 @@ asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
>  	 * Use that to determine where the sender put the data.
>  	 */
>  	sender = ~regs->orig_ax - INVALIDATE_TLB_VECTOR_START;
> -	f = &per_cpu(flush_state, sender);
> +	f = &flush_state[sender];
>  
>  	if (!cpu_isset(cpu, f->flush_cpumask))
>  		goto out;
> @@ -169,7 +169,7 @@ void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
>  
>  	/* Caller has disabled preemption */
>  	sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
> -	f = &per_cpu(flush_state, sender);
> +	f = &flush_state[sender];
>  
>  	/*
>  	 * Could avoid this lock when
> @@ -205,8 +205,8 @@ static int __cpuinit init_smp_flush(void)
>  {
>  	int i;
>  
> -	for_each_possible_cpu(i)
> -		spin_lock_init(&per_cpu(flush_state, i).tlbstate_lock);
> +	for (i = 0; i < ARRAY_SIZE(flush_state); i++)
> +		spin_lock_init(&flush_state[i].tlbstate_lock);
>  
>  	return 0;
>  }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ