linux-kernel - Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <570C03AC.7080404@hpe.com>
Date:	Mon, 11 Apr 2016 16:06:04 -0400
From:	Waiman Long <waiman.long@....com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
	Jonathan Corbet <corbet@....net>,
	LKML <linux-kernel@...r.kernel.org>, <linux-doc@...nel.org>,
	<x86@...nel.org>, Borislav Petkov <bp@...e.de>,
	Andy Lutomirski <luto@...nel.org>,
	Scott J Norton <scott.norton@....com>,
	Douglas Hatch <doug.hatch@....com>,
	Randy Wright <rwright@....com>
Subject: Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention

On 04/09/2016 08:19 PM, Thomas Gleixner wrote:
> On Fri, 8 Apr 2016, Waiman Long wrote:
>> This patch attempts to reduce HPET read contention by using the fact
>> that if more than one task are trying to access HPET at the same time,
>> it will be more efficient if one task in the group reads the HPET
>> counter and shares it with the rest of the group instead of each
>> group member reads the HPET counter individually.
> That has nothing to do with tasks. clocksource reads can happen from almost
> any context. The problem is concurrent access on multiple cpus.

You are right. I should have used CPU instead.

>> This optimization is enabled on systems with more than 32 CPUs. It can
>> also be explicitly enabled or disabled by using the new opt_read_hpet
>> kernel parameter.
> Please not. What's wrong with enabling it unconditionally?
>   

That is nothing wrong to enable it unconditionally. I am just not sure 
if that is the right thing to do. Since both you and Andy said we should 
enable it unconditionally, I will do so in the next version of the patch.

>> +/*
>>    * Clock source related code
>>    */
>>   static cycle_t read_hpet(struct clocksource *cs)
>>   {
>> -	return (cycle_t)hpet_readl(HPET_COUNTER);
>> +	int seq, cnt = 0;
>> +	u32 time;
>> +
>> +	if (opt_read_hpet<= 0)
>> +		return (cycle_t)hpet_readl(HPET_COUNTER);
> This wants to be conditional on CONFIG_SMP. No point in having all that muck
> around for an UP kernel.

Will do so.

>> +	seq = READ_ONCE(hpet_save.seq);
>> +	if (!HPET_SEQ_LOCKED(seq)) {
>> +		int old, new = seq + 1;
>> +		unsigned long flags;
>> +
>> +		local_irq_save(flags);
>> +		/*
>> +		 * Set the lock bit (lsb) to get the right to read HPET
>> +		 * counter directly. If successful, read the counter, save
>> +		 * its value, and increment the sequence number. Otherwise,
>> +		 * increment the sequnce number to the expected locked value
>> +		 * for comparison later on.
>> +		 */
>> +		old = cmpxchg(&hpet_save.seq, seq, new);
>> +		if (old == seq) {
>> +			time = hpet_readl(HPET_COUNTER);
>> +			WRITE_ONCE(hpet_save.hpet, time);
>> +
>> +			/* Unlock */
>> +			smp_store_release(&hpet_save.seq, new + 1);
>> +			local_irq_restore(flags);
>> +			return (cycle_t)time;
>> +		}
>> +		local_irq_restore(flags);
>> +		seq = new;
>> +	}
>> +
>> +	/*
>> +	 * Wait until the locked sequence number changes which indicates
>> +	 * that the saved HPET value is up-to-date.
>> +	 */
>> +	while (READ_ONCE(hpet_save.seq) == seq) {
>> +		/*
>> +		 * Since reading the HPET is much slower than a single
>> +		 * cpu_relax() instruction, we use two here in an attempt
>> +		 * to reduce the amount of cacheline contention in the
>> +		 * hpet_save.seq cacheline.
>> +		 */
>> +		cpu_relax();
>> +		cpu_relax();
>> +
>> +		if (likely(++cnt<= HPET_RESET_THRESHOLD))
>> +			continue;
>> +
>> +		/*
>> +		 * In the unlikely event that it takes too long for the lock
>> +		 * holder to read the HPET, we do it ourselves and try to
>> +		 * reset the lock. This will also break a deadlock if it
>> +		 * happens, for example, when the process context lock holder
>> +		 * gets killed in the middle of reading the HPET counter.
>> +		 */
>> +		time = hpet_readl(HPET_COUNTER);
>> +		WRITE_ONCE(hpet_save.hpet, time);
>> +		if (READ_ONCE(hpet_save.seq) == seq) {
>> +			if (cmpxchg(&hpet_save.seq, seq, seq + 1) == seq)
>> +				pr_warn("read_hpet: reset hpet seq to 0x%x\n",
>> +					seq + 1);
> This is voodoo programming and actively dangerous.
>
> CPU0 	        CPU1	       		CPU2
> lock_hpet()
> T1=read_hpet()	wait_for_unlock()	
> store_hpet(T1)	
> 		....			
> 		T2 = read_hpet()
> unlock_hpet()				
> 					lock_hpet()
> 					T3 = read_hpet()
> 					store_hpet(T3)
> 					unlock_hpet()
> 					return T3
> lock_hpet()
> T4 = read_hpet()			wait_for_unlock()
> store_hpet(T4)	
> 		store_hpet(T2)			
> unlock_hpet()				return T2
>
> CPU2 will observe time going backwards.
>
> Thanks,
>
> 	tglx

That part is leftover code from my testing and debugging effort. I think 
using local_irq_save() should allow the critical section to be executed 
without interruption. In this case, I should be able to remove the 
threshold checking code without harm.

Thanks for the review.

Cheers,
Longman