lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 06 Jun 2017 13:53:57 -0700 (PDT)
From:   Palmer Dabbelt <palmer@...belt.com>
To:     Arnd Bergmann <arnd@...db.de>
CC:     albert@...ive.com
Subject:     Re: [PATCH 5/7] RISC-V: arch/riscv/lib

On Tue, 06 Jun 2017 02:31:02 PDT (-0700), Arnd Bergmann wrote:
> On Tue, Jun 6, 2017 at 6:56 AM, Palmer Dabbelt <palmer@...belt.com> wrote:
>> On Fri, 26 May 2017 02:06:58 PDT (-0700), Arnd Bergmann wrote:
>>> On Thu, May 25, 2017 at 3:59 AM, Palmer Dabbelt <palmer@...belt.com> wrote:
>>>> On Tue, 23 May 2017 04:19:42 PDT (-0700), Arnd Bergmann wrote:
>>>>> On Tue, May 23, 2017 at 2:41 AM, Palmer Dabbelt <palmer@...belt.com> wrote:
>>>
>>>>> Also, it would be good to replace the multiply+div64
>>>>> with a single multiplication here, see how x86 and arm do it
>>>>> (for the tsc/__timer_delay case).
>>>>
>>>> Makes sense.  I think this should do it
>>>>
>>>>   https://github.com/riscv/riscv-linux/commit/d397332f6ebff42f3ecb385e9cf3284fdeda6776
>>>>
>>>> but I'm finding this hard to test as this only works for 2ms sleeps.  It seems
>>>> at least in the right ballpark
>>>
>>> + if (usecs > MAX_UDELAY_US) {
>>> + __delay((u64)usecs * riscv_timebase / 1000000ULL);
>>> + return;
>>> + }
>>>
>>> You still do the 64-bit division here. What I meant is to completely
>>> avoid the division and use a multiply+shift.
>>
>> The goal here was to avoid the error case that ARM has on overflow and instead
>> just delay for the requested time.  This should only divide when the delay is
>>>=2ms, so the division won't cost much in comparison.
>>
>> The normal case should have no division in it.
>>
>> I can copy ARM's error handling if you think that's better, but it seemed more
>> complicated than just computing the correct answer.
>
> I think the intention originally was to avoid overflowing the 32-bit
> argument in
>
>  void __delay(unsigned long cycles)
>
> If you need to delay for more than 4 billion clocksource cycles,
> your code is still broken.

Maybe I'm crazy, but I thought the goal was to avoid overflowing on the
multiply.  Specifically, the code looks like

  udelay(long input) {
    long a = input * MUL_VAL;
    long b = a >> SHIFT_VAL;
    __delay(b);
  }

so the place there's extra overflow is at computing a, not b (the input to
__delay).  When I modified the ARM code I went and recalculated the point at
which the multiply would overflow and it matched the value from the ARM code,
which is 2000us.

While I can buy the argument that 2000us is still too long, the real reason I
wrote the code this way is because I thought it was easier than having an error
case.  If you think the error is better then I'll do it that way.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ