lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 29 Sep 2020 20:00:21 -0500
From:   Samuel Holland <samuel@...lland.org>
To:     Mark Rutland <mark.rutland@....com>,
        Roman Stratiienko <r.stratiienko@...il.com>
Cc:     linux-sunxi@...glegroups.com, megous@...ous.com,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        maz@...nel.org
Subject: Re: [PATCH] RFC: arm64: arch_timer: Fix timer inconsistency test for
 A64

On 9/29/20 10:40 AM, Mark Rutland wrote:
> Hi,
> 
> Please Cc maintainers for drivers -- Marc and I maintain the arch timer
> driver.
> 
> On Tue, Sep 29, 2020 at 02:13:47PM +0300, Roman Stratiienko wrote:
>> Fixes linux_kselftest:timers_inconsistency-check_arm_64
>>
>> Test logs without the fix:
>> '''
>> binary returned non-zero. Exit code: 1, stderr: , stdout:
>> Consistent CLOCK_REALTIME
>> 1601335525:467086804
>> 1601335525:467087554
>> 1601335525:467088345
>> 1601335525:467089095
>> 1601335525:467089887
>> 1601335525:467090637
>> 1601335525:467091429
>> 1601335525:467092179
>> 1601335525:467092929
>> 1601335525:467093720
>> 1601335525:467094470
>> 1601335525:467095262
>> 1601335525:467096012
>> 1601335525:467096804
>> --------------------
>> 1601335525:467097554
>> 1601335525:467077012
> 
> That's 0x1BD757D2 followed by 0x1BD70794. The rollback is somewhere in
> bits 15:12 to go from 0x1BD75xxx to 0x1BD70xxx, which suggests the
> analysis in the existing comment is incomplete.

My analysis only looked at consecutive timer reads on a single core. Apparently,
from the vendor comment that they needed CLOCKSOURCE_VALIDATE_LAST_CYCLE (linked
below), there is another inconsistency with reads across cores.

>> --------------------
>> 1601335525:467099095
>> 1601335525:467099845
>> 1601335525:467100637
>> 1601335525:467101387
>> 1601335525:467102179
>> 1601335525:467102929
>> '''
> 
> It would be very helpful if the commit message could explain the rough
> idea behind the change, because the rationale is not clear to me.
> 
>> Signed-off-by: Roman Stratiienko <r.stratiienko@...il.com>
>> CC: linux-arm-kernel@...ts.infradead.org
>> CC: linux-kernel@...r.kernel.org
>> CC: linux-sunxi@...glegroups.com
>> CC: megous@...ous.com
>> ---
>>  drivers/clocksource/arm_arch_timer.c | 9 +++++----
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>> index 6c3e841801461..d50aa43cb654b 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
>>   * number of CPU cycles in 3 consecutive 24 MHz counter periods.
>>   */
>>  #define __sun50i_a64_read_reg(reg) ({					\
>> -	u64 _val;							\
>> +	u64 _val1, _val2;						\
>>  	int _retries = 150;						\
>>  									\
>>  	do {								\
>> -		_val = read_sysreg(reg);				\
>> +		_val1 = read_sysreg(reg);				\
>> +		_val2 = read_sysreg(reg);				\
>>  		_retries--;						\
>> -	} while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);	\
>> +	} while (((_val2 - _val1) > 0x10) && _retries);			\
> 
> This is going to fail quite often at low CPU frequencies, and it's not
> clear to me that this solves the problem any more generally. DO we know
> what the underlying erratum is here?

This is what we have from the vendor:

https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/arch/arm64/include/asm/arch_timer.h#L155
https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272

and they select CLOCKSOURCE_VALIDATE_LAST_CYCLE.

Everything else we know (e.g. the tables and explanation in my original commit
log) is from testing by users.

I had not seen the vendor changes to arch_timer.h when I wrote the existing
workaround. Their retry loop is similar to this patch, but as you mention, it
won't work if the CPU frequency is too low. That may not be a problem for the
vendor kernel, but it breaks badly on mainline.

As Ondrej referenced, the mainline DVFS implementation bypasses the PLL during
frequency changes. At that point, both the CPU and the timer are running at 24
MHz, so the chance of hitting the retry limit is high... and there's a udelay()
call in the PLL bypass code (ccu_mux_notifier_cb()) that will need to read the
timer.

> Thanks,
> Mark.

Cheers,
Samuel

>>  									\
>>  	WARN_ON_ONCE(!_retries);					\
>> -	_val;								\
>> +	_val2;								\
>>  })
>>  
>>  static u64 notrace sun50i_a64_read_cntpct_el0(void)
>> -- 
>> 2.25.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@...ts.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ