linux-kernel - Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1604091552540.3786@nanos>
Date:	Sat, 9 Apr 2016 17:19:00 -0700 (PDT)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Waiman Long <Waiman.Long@....com>
cc:	Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
	Jonathan Corbet <corbet@....net>,
	LKML <linux-kernel@...r.kernel.org>, linux-doc@...nel.org,
	x86@...nel.org, Borislav Petkov <bp@...e.de>,
	Andy Lutomirski <luto@...nel.org>,
	Scott J Norton <scott.norton@....com>,
	Douglas Hatch <doug.hatch@....com>,
	Randy Wright <rwright@....com>
Subject: Re: [PATCH v2] x86/hpet: Reduce HPET counter read contention

On Fri, 8 Apr 2016, Waiman Long wrote:
> This patch attempts to reduce HPET read contention by using the fact
> that if more than one task are trying to access HPET at the same time,
> it will be more efficient if one task in the group reads the HPET
> counter and shares it with the rest of the group instead of each
> group member reads the HPET counter individually.

That has nothing to do with tasks. clocksource reads can happen from almost
any context. The problem is concurrent access on multiple cpus.

> This optimization is enabled on systems with more than 32 CPUs. It can
> also be explicitly enabled or disabled by using the new opt_read_hpet
> kernel parameter.

Please not. What's wrong with enabling it unconditionally?
 
> +/*
>   * Clock source related code
>   */
>  static cycle_t read_hpet(struct clocksource *cs)
>  {
> -	return (cycle_t)hpet_readl(HPET_COUNTER);
> +	int seq, cnt = 0;
> +	u32 time;
> +
> +	if (opt_read_hpet <= 0)
> +		return (cycle_t)hpet_readl(HPET_COUNTER);

This wants to be conditional on CONFIG_SMP. No point in having all that muck
around for an UP kernel.

> +	seq = READ_ONCE(hpet_save.seq);
> +	if (!HPET_SEQ_LOCKED(seq)) {
> +		int old, new = seq + 1;
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		/*
> +		 * Set the lock bit (lsb) to get the right to read HPET
> +		 * counter directly. If successful, read the counter, save
> +		 * its value, and increment the sequence number. Otherwise,
> +		 * increment the sequnce number to the expected locked value
> +		 * for comparison later on.
> +		 */
> +		old = cmpxchg(&hpet_save.seq, seq, new);
> +		if (old == seq) {
> +			time = hpet_readl(HPET_COUNTER);
> +			WRITE_ONCE(hpet_save.hpet, time);
> +
> +			/* Unlock */
> +			smp_store_release(&hpet_save.seq, new + 1);
> +			local_irq_restore(flags);
> +			return (cycle_t)time;
> +		}
> +		local_irq_restore(flags);
> +		seq = new;
> +	}
> +
> +	/*
> +	 * Wait until the locked sequence number changes which indicates
> +	 * that the saved HPET value is up-to-date.
> +	 */
> +	while (READ_ONCE(hpet_save.seq) == seq) {
> +		/*
> +		 * Since reading the HPET is much slower than a single
> +		 * cpu_relax() instruction, we use two here in an attempt
> +		 * to reduce the amount of cacheline contention in the
> +		 * hpet_save.seq cacheline.
> +		 */
> +		cpu_relax();
> +		cpu_relax();
> +
> +		if (likely(++cnt <= HPET_RESET_THRESHOLD))
> +			continue;
> +
> +		/*
> +		 * In the unlikely event that it takes too long for the lock
> +		 * holder to read the HPET, we do it ourselves and try to
> +		 * reset the lock. This will also break a deadlock if it
> +		 * happens, for example, when the process context lock holder
> +		 * gets killed in the middle of reading the HPET counter.
> +		 */
> +		time = hpet_readl(HPET_COUNTER);
> +		WRITE_ONCE(hpet_save.hpet, time);
> +		if (READ_ONCE(hpet_save.seq) == seq) {
> +			if (cmpxchg(&hpet_save.seq, seq, seq + 1) == seq)
> +				pr_warn("read_hpet: reset hpet seq to 0x%x\n",
> +					seq + 1);

This is voodoo programming and actively dangerous.

CPU0 	        CPU1	       		CPU2
lock_hpet()
T1=read_hpet()	wait_for_unlock()	
store_hpet(T1)	
		....			
		T2 = read_hpet()
unlock_hpet()				
					lock_hpet()
					T3 = read_hpet()
					store_hpet(T3)
					unlock_hpet()
					return T3
lock_hpet()
T4 = read_hpet()			wait_for_unlock()
store_hpet(T4)	
		store_hpet(T2)			
unlock_hpet()				return T2

CPU2 will observe time going backwards.

Thanks,

	tglx