lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <4C8E3CF70200005A00071931@soto.provo.novell.com>
Date:	Mon, 13 Sep 2010 13:02:15 -0600
From:	"Gregory Haskins" <ghaskins@...ell.com>
To:	<rostedt@...dmis.org>, "Chris Mason" <chris.mason@...cle.com>
Cc:	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] get rid of cpupri lock

Hi Chris

>>> On 9/13/2010 at 02:04 PM, in message <20100913180415.GT21374@...nk>, Chris
Mason <chris.mason@...cle.com> wrote: 
> I recently had the chance to try and tune 2.6.32 kernels running oracle
> on OLTP workloads.  One of the things oracle loves to do for tuning
> these benchmarks is make all the database tasks involved SCHED_FIFO.
> This is because they have their own userland locks and if they get
> scheduled out, lock contention goes up.
> 
> <please insert flames about userland locking here>
> 
> <and here>
> 
> <and here>
> 
> The box I was tuning had 8 sockets and the first thing that jumped out
> at me during the run was that we were spending all our system time
> inside cpupri_set.  Since the rq lock must be held in cpurpi_set, I
> don't think we need the cpupri lock at all.

I haven't had a chance to study your patch yet, but this comment concerns me.  I think the issue is that the rq locks are per CPU, while the cpupri locks are per priority.  Therefore, I don't think it is safe to rely on the rq lock to guard this structure since other cpus/rqs may be concurrently updating the vector.  That said, the cpupri lock has bothered me since the beginning, so any clever scheme which can avoid it would be highly welcome.

Ill try to spend some time later going over your proposal.

Kind Regards,
-Greg

> 
> The patch below is entirely untested, mostly because I'm hoping for
> hints on good ways to test it.  Clearly Oracle RT isn't the workload we
> really want to tune for, but I think this change is generally useful if
> we can do it safely.
> 
> cpusets could also be used to mitigate this problem, but if we can just
> avoid the lock it would be nice.
> 
> diff --git a/kernel/sched_cpupri.c b/kernel/sched_cpupri.c
> index 2722dc1..dd51302 100644
> --- a/kernel/sched_cpupri.c
> +++ b/kernel/sched_cpupri.c
> @@ -115,7 +115,6 @@ void cpupri_set(struct cpupri *cp, int cpu, int newpri)
>  {
>  	int                 *currpri = &cp->cpu_to_pri[cpu];
>  	int                  oldpri  = *currpri;
> -	unsigned long        flags;
>  
>  	newpri = convert_prio(newpri);
>  
> @@ -134,26 +133,15 @@ void cpupri_set(struct cpupri *cp, int cpu, int newpri)
>  	if (likely(newpri != CPUPRI_INVALID)) {
>  		struct cpupri_vec *vec = &cp->pri_to_cpu[newpri];
>  
> -		raw_spin_lock_irqsave(&vec->lock, flags);
> -
>  		cpumask_set_cpu(cpu, vec->mask);
> -		vec->count++;
> -		if (vec->count == 1)
> +		if (atomic_inc_return(&vec->count) == 1)
>  			set_bit(newpri, cp->pri_active);
> -
> -		raw_spin_unlock_irqrestore(&vec->lock, flags);
>  	}
>  	if (likely(oldpri != CPUPRI_INVALID)) {
>  		struct cpupri_vec *vec  = &cp->pri_to_cpu[oldpri];
> -
> -		raw_spin_lock_irqsave(&vec->lock, flags);
> -
> -		vec->count--;
> -		if (!vec->count)
> +		if (atomic_dec_and_test(&vec->count))
>  			clear_bit(oldpri, cp->pri_active);
>  		cpumask_clear_cpu(cpu, vec->mask);
> -
> -		raw_spin_unlock_irqrestore(&vec->lock, flags);
>  	}
>  
>  	*currpri = newpri;
> @@ -174,9 +162,8 @@ int cpupri_init(struct cpupri *cp)
>  
>  	for (i = 0; i < CPUPRI_NR_PRIORITIES; i++) {
>  		struct cpupri_vec *vec = &cp->pri_to_cpu[i];
> +		atomic_set(&vec->count, 0);
>  
> -		raw_spin_lock_init(&vec->lock);
> -		vec->count = 0;
>  		if (!zalloc_cpumask_var(&vec->mask, GFP_KERNEL))
>  			goto cleanup;
>  	}
> diff --git a/kernel/sched_cpupri.h b/kernel/sched_cpupri.h
> index 9fc7d38..fe07002 100644
> --- a/kernel/sched_cpupri.h
> +++ b/kernel/sched_cpupri.h
> @@ -12,8 +12,7 @@
>  /* values 2-101 are RT priorities 0-99 */
>  
>  struct cpupri_vec {
> -	raw_spinlock_t lock;
> -	int        count;
> +	atomic_t   count;
>  	cpumask_var_t mask;
>  };


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ