[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <544ABC47.2000700@hp.com>
Date: Fri, 24 Oct 2014 16:53:27 -0400
From: Waiman Long <waiman.long@...com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, linux-arch@...r.kernel.org,
x86@...nel.org, linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org,
xen-devel@...ts.xenproject.org, kvm@...r.kernel.org,
Paolo Bonzini <paolo.bonzini@...il.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Rik van Riel <riel@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
David Vrabel <david.vrabel@...rix.com>,
Oleg Nesterov <oleg@...hat.com>,
Scott J Norton <scott.norton@...com>,
Douglas Hatch <doug.hatch@...com>
Subject: Re: [PATCH v12 09/11] pvqspinlock, x86: Add para-virtualization support
On 10/24/2014 04:47 AM, Peter Zijlstra wrote:
> On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
>> +static inline void pv_init_node(struct mcs_spinlock *node)
>> +{
>> + struct pv_qnode *pn = (struct pv_qnode *)node;
>> +
>> + BUILD_BUG_ON(sizeof(struct pv_qnode)> 5*sizeof(struct mcs_spinlock));
>> +
>> + if (!pv_enabled())
>> + return;
>> +
>> + pn->cpustate = PV_CPU_ACTIVE;
>> + pn->mayhalt = false;
>> + pn->mycpu = smp_processor_id();
>> + pn->head = PV_INVALID_HEAD;
>> +}
>
>> @@ -333,6 +393,7 @@ queue:
>> node += idx;
>> node->locked = 0;
>> node->next = NULL;
>> + pv_init_node(node);
>>
>> /*
>> * We touched a (possibly) cold cacheline in the per-cpu queue node;
>
> So even if !pv_enabled() the compiler will still have to emit the code
> for that inline, which will generate additional register pressure,
> icache pressure and lovely stuff like that.
>
> The patch I had used pv-ops for these things that would turn into NOPs
> in the regular case and callee-saved function calls for the PV case.
>
> That still does not entirely eliminate cost, but does reduce it
> significant. Please consider using that.
The additional register pressure may just cause a few more register
moves which should be negligible in the overall performance . The
additional icache pressure, however, may have some impact on
performance. I was trying to balance the performance of the pv and
non-pv versions so that we won't penalize the pv code too much for a bit
more performance in the non-pv code. Doing it your way will add a lot of
function call and register saving/restoring to the pv code.
Another alternative that I can think of is to generate 2 versions of the
slowpath code - one pv and one non-pv out of the same source code. The
non-pv code will call into the pv code once if pv is enabled. In this
way, it won't increase the icache and register pressure of the non-pv
code. However, this may make the source code a bit harder to read.
Please let me know your thought on this alternate approach.
-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists