[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62fa6343-e084-75c3-01c9-349a4617e67c@redhat.com>
Date: Wed, 8 Jul 2020 19:50:05 -0400
From: Waiman Long <longman@...hat.com>
To: Nicholas Piggin <npiggin@...il.com>, linuxppc-dev@...ts.ozlabs.org
Cc: Anton Blanchard <anton@...abs.org>,
Boqun Feng <boqun.feng@...il.com>, kvm-ppc@...r.kernel.org,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
virtualization@...ts.linux-foundation.org,
Will Deacon <will@...nel.org>
Subject: Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks
On 7/8/20 1:10 AM, Nicholas Piggin wrote:
> Excerpts from Waiman Long's message of July 8, 2020 1:33 pm:
>> On 7/7/20 1:57 AM, Nicholas Piggin wrote:
>>> Yes, powerpc could certainly get more performance out of the slow
>>> paths, and then there are a few parameters to tune.
>>>
>>> We don't have a good alternate patching for function calls yet, but
>>> that would be something to do for native vs pv.
>>>
>>> And then there seem to be one or two tunable parameters we could
>>> experiment with.
>>>
>>> The paravirt locks may need a bit more tuning. Some simple testing
>>> under KVM shows we might be a bit slower in some cases. Whether this
>>> is fairness or something else I'm not sure. The current simple pv
>>> spinlock code can do a directed yield to the lock holder CPU, whereas
>>> the pv qspl here just does a general yield. I think we might actually
>>> be able to change that to also support directed yield. Though I'm
>>> not sure if this is actually the cause of the slowdown yet.
>> Regarding the paravirt lock, I have taken a further look into the
>> current PPC spinlock code. There is an equivalent of pv_wait() but no
>> pv_kick(). Maybe PPC doesn't really need that.
> So powerpc has two types of wait, either undirected "all processors" or
> directed to a specific processor which has been preempted by the
> hypervisor.
>
> The simple spinlock code does a directed wait, because it knows the CPU
> which is holding the lock. In this case, there is a sequence that is
> used to ensure we don't wait if the condition has become true, and the
> target CPU does not need to kick the waiter it will happen automatically
> (see splpar_spin_yield). This is preferable because we only wait as
> needed and don't require the kick operation.
Thanks for the explanation.
>
> The pv spinlock code I did uses the undirected wait, because we don't
> know the CPU number which we are waiting on. This is undesirable because
> it's higher overhead and the wait is not so accurate.
>
> I think perhaps we could change things so we wait on the correct CPU
> when queued, which might be good enough (we could also put the lock
> owner CPU in the spinlock word, if we add another format).
The LS byte of the lock word is used to indicate locking status. If we
have less than 255 cpus, we can put the (cpu_nr + 1) into the lock byte.
The special 0xff value can be used to indicate a cpu number >= 255 for
indirect yield. The required change to the qspinlock code will be
minimal, I think.
>> Attached are two
>> additional qspinlock patches that adds a CONFIG_PARAVIRT_QSPINLOCKS_LITE
>> option to not require pv_kick(). There is also a fixup patch to be
>> applied after your patchset.
>>
>> I don't have access to a PPC LPAR with shared processor at the moment,
>> so I can't test the performance of the paravirt code. Would you mind
>> adding my patches and do some performance test on your end to see if it
>> gives better result?
> Great, I'll do some tests. Any suggestions for what to try?
I will just like to see if it will produce some better performance
result compared with your current version.
Cheers,
Longman
Powered by blists - more mailing lists