[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <570C03AC.7080404@hpe.com>
Date: Mon, 11 Apr 2016 16:06:04 -0400
From: Waiman Long <waiman.long@....com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
Jonathan Corbet <corbet@....net>,
LKML <linux-kernel@...r.kernel.org>, <linux-doc@...nel.org>,
<x86@...nel.org>, Borislav Petkov <bp@...e.de>,
Andy Lutomirski <luto@...nel.org>,
Scott J Norton <scott.norton@....com>,
Douglas Hatch <doug.hatch@....com>,
Randy Wright <rwright@....com>
Subject: Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention
On 04/09/2016 08:19 PM, Thomas Gleixner wrote:
> On Fri, 8 Apr 2016, Waiman Long wrote:
>> This patch attempts to reduce HPET read contention by using the fact
>> that if more than one task are trying to access HPET at the same time,
>> it will be more efficient if one task in the group reads the HPET
>> counter and shares it with the rest of the group instead of each
>> group member reads the HPET counter individually.
> That has nothing to do with tasks. clocksource reads can happen from almost
> any context. The problem is concurrent access on multiple cpus.
You are right. I should have used CPU instead.
>> This optimization is enabled on systems with more than 32 CPUs. It can
>> also be explicitly enabled or disabled by using the new opt_read_hpet
>> kernel parameter.
> Please not. What's wrong with enabling it unconditionally?
>
That is nothing wrong to enable it unconditionally. I am just not sure
if that is the right thing to do. Since both you and Andy said we should
enable it unconditionally, I will do so in the next version of the patch.
>> +/*
>> * Clock source related code
>> */
>> static cycle_t read_hpet(struct clocksource *cs)
>> {
>> - return (cycle_t)hpet_readl(HPET_COUNTER);
>> + int seq, cnt = 0;
>> + u32 time;
>> +
>> + if (opt_read_hpet<= 0)
>> + return (cycle_t)hpet_readl(HPET_COUNTER);
> This wants to be conditional on CONFIG_SMP. No point in having all that muck
> around for an UP kernel.
Will do so.
>> + seq = READ_ONCE(hpet_save.seq);
>> + if (!HPET_SEQ_LOCKED(seq)) {
>> + int old, new = seq + 1;
>> + unsigned long flags;
>> +
>> + local_irq_save(flags);
>> + /*
>> + * Set the lock bit (lsb) to get the right to read HPET
>> + * counter directly. If successful, read the counter, save
>> + * its value, and increment the sequence number. Otherwise,
>> + * increment the sequnce number to the expected locked value
>> + * for comparison later on.
>> + */
>> + old = cmpxchg(&hpet_save.seq, seq, new);
>> + if (old == seq) {
>> + time = hpet_readl(HPET_COUNTER);
>> + WRITE_ONCE(hpet_save.hpet, time);
>> +
>> + /* Unlock */
>> + smp_store_release(&hpet_save.seq, new + 1);
>> + local_irq_restore(flags);
>> + return (cycle_t)time;
>> + }
>> + local_irq_restore(flags);
>> + seq = new;
>> + }
>> +
>> + /*
>> + * Wait until the locked sequence number changes which indicates
>> + * that the saved HPET value is up-to-date.
>> + */
>> + while (READ_ONCE(hpet_save.seq) == seq) {
>> + /*
>> + * Since reading the HPET is much slower than a single
>> + * cpu_relax() instruction, we use two here in an attempt
>> + * to reduce the amount of cacheline contention in the
>> + * hpet_save.seq cacheline.
>> + */
>> + cpu_relax();
>> + cpu_relax();
>> +
>> + if (likely(++cnt<= HPET_RESET_THRESHOLD))
>> + continue;
>> +
>> + /*
>> + * In the unlikely event that it takes too long for the lock
>> + * holder to read the HPET, we do it ourselves and try to
>> + * reset the lock. This will also break a deadlock if it
>> + * happens, for example, when the process context lock holder
>> + * gets killed in the middle of reading the HPET counter.
>> + */
>> + time = hpet_readl(HPET_COUNTER);
>> + WRITE_ONCE(hpet_save.hpet, time);
>> + if (READ_ONCE(hpet_save.seq) == seq) {
>> + if (cmpxchg(&hpet_save.seq, seq, seq + 1) == seq)
>> + pr_warn("read_hpet: reset hpet seq to 0x%x\n",
>> + seq + 1);
> This is voodoo programming and actively dangerous.
>
> CPU0 CPU1 CPU2
> lock_hpet()
> T1=read_hpet() wait_for_unlock()
> store_hpet(T1)
> ....
> T2 = read_hpet()
> unlock_hpet()
> lock_hpet()
> T3 = read_hpet()
> store_hpet(T3)
> unlock_hpet()
> return T3
> lock_hpet()
> T4 = read_hpet() wait_for_unlock()
> store_hpet(T4)
> store_hpet(T2)
> unlock_hpet() return T2
>
> CPU2 will observe time going backwards.
>
> Thanks,
>
> tglx
That part is leftover code from my testing and debugging effort. I think
using local_irq_save() should allow the critical section to be executed
without interruption. In this case, I should be able to remove the
threshold checking code without harm.
Thanks for the review.
Cheers,
Longman
Powered by blists - more mailing lists