[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <57227791.2020908@hpe.com>
Date: Thu, 28 Apr 2016 16:50:25 -0400
From: Waiman Long <waiman.long@....com>
To: Boqun Feng <boqun.feng@...il.com>
CC: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
Pan Xinhui <xinhui@...ux.vnet.ibm.com>,
Scott J Norton <scott.norton@....com>,
Douglas Hatch <doug.hatch@....com>
Subject: Re: [RESEND PATCH v3] locking/pvqspinlock: Add lock holder CPU argument
to pv_wait()
On 04/28/2016 09:32 AM, Boqun Feng wrote:
> Hi Waiman,
>
> On Thu, Apr 21, 2016 at 11:02:05AM -0400, Waiman Long wrote:
>> Pan Xinhui was asking for a lock holder cpu argument in pv_wait()
>> to help the porting of pvqspinlock to PPC. The new argument will can
>> help hypervisor expediate the execution of the critical section by
>> the lock holder, if its vCPU isn't running, so that it can release
>> the lock sooner.
>>
>> The pv_wait() function is now extended to have a third argument
>> that holds the vCPU number of the lock holder, if this is
>> known. Architecture specific hypervisor code can then make use of
>> that information to make better decision of which vCPU should be
>> running next.
>>
>> This patch also adds a new field in the pv_node structure to hold
>> the vCPU number of the previous queue entry. For the queue head,
>> the prev_cpu entry is likely to be the that of the lock holder,
>> though lock stealing may render this information inaccurate.
>>
>> In pv_wait_head_or_lock(), pv_wait() will now be called with that
>> vCPU number as it is likely to be the lock holder.
>>
>> In pv_wait_node(), the newly added pv_lookup_hash() function will
>> be called to look up the queue head and pass in the lock holder vCPU
>> number stored there, if found.
>>
>> Suggested-by: Pan Xinhui<xinhui@...ux.vnet.ibm.com>
>> Signed-off-by: Waiman Long<Waiman.Long@....com>
>> ---
>>
>> v2->v3:
>> - Rephrase the changelog and some of the comments to make the
>> intention of this patch more clear.
>> - Make pv_lookup_hash() architecture dependent to eliminate its cost
>> if an architecture doesn't need it. Also make the number of
>> cachelines searched in pv_lookup_hash() to between 1-4.
>> - [RESEND] Fix typo error.
>>
>> v1->v2:
>> - In pv_wait_node(), lookup the hash table for the queue head and pass
>> the lock holder cpu number to pv_wait().
>>
> [snip]
>
>> @@ -282,7 +328,8 @@ static void pv_init_node(struct mcs_spinlock *node)
>> * pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
>> * behalf.
>> */
>> -static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
>> +static void pv_wait_node(struct qspinlock *lock, struct mcs_spinlock *node,
>> + struct mcs_spinlock *prev)
>> {
>> struct pv_node *pn = (struct pv_node *)node;
>> struct pv_node *pp = (struct pv_node *)prev;
>> @@ -290,6 +337,8 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
>> int loop;
>> bool wait_early;
>>
>> + pn->prev_cpu = pp->cpu; /* Save vCPU # of previous queue entry */
>> +
>> /* waitcnt processing will be compiled out if !QUEUED_LOCK_STAT */
>> for (;; waitcnt++) {
>> for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
>> @@ -314,10 +363,23 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
>> smp_store_mb(pn->state, vcpu_halted);
>>
>> if (!READ_ONCE(node->locked)) {
>> + struct pv_node *ph;
>> +
>> qstat_inc(qstat_pv_wait_node, true);
>> qstat_inc(qstat_pv_wait_again, waitcnt);
>> qstat_inc(qstat_pv_wait_early, wait_early);
>> - pv_wait(&pn->state, vcpu_halted);
>> +
>> + /*
>> + * If the current queue head is in the hash table,
>> + * the prev_cpu field of its pv_node may contain the
>> + * vCPU # of the lock holder. However, lock stealing
>> + * may make that information inaccurate. Anyway, we
>> + * look up the hash table to try to get the lock
>> + * holder vCPU number.
>> + */
>> + ph = pv_lookup_hash(lock);
> I am thinking, could we save the hash lookup here if wait_early == true?
> Because in that case, the previous node is likely to get the lock soon,
> it makes sense we yield to that node rather than the holder, so may we
> can do something like:
>
> if (wait_early)
> pv_wait(&pn->state, vcpu_halted, pn->prev_cpu);
> else {
> ph = pv_lookup_hash(lock);
> pv_wait(&pn->state, vcpu_halted,
> ph ? ph->prev_cpu : -1);
> }
>
> Thoughts?
>
> Regards,
> Boqun
>
One reason why I don't want to just wake up the previous node vCPU is
because I am concerned that a vCPU may go into a suspend-wakeup loop. So
I will wait until we get experimental evidence that this really help
performance before we commit to this kind of change.
Cheers,
Longman
Powered by blists - more mailing lists