lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 19 Oct 2017 11:27:29 +0200
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Matt Redfearn <matt.redfearn@...s.com>, linux-mips@...ux-mips.org,
        Matt Redfearn <matt.redfearn@...tec.com>,
        "# v3 . 19 +" <stable@...r.kernel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu_sched timeouts
 from multithreading

On 19/10/2017 11:18, Thomas Gleixner wrote:
> On Thu, 19 Oct 2017, Daniel Lezcano wrote:
>> On 18/10/2017 22:34, Thomas Gleixner wrote:
>>> On Wed, 11 Oct 2017, Matt Redfearn wrote:
>>>
>>>> When the MIPS GIC clockevent code was written, it appears to have
>>>> inherited the 0x300 cycle min delta from the MIPS CPU timer driver. This
>>>> is suboptimal for two reasons.
>>>>
>>>> Firstly, the CPU timer counts once every other cycle (i.e. half the
>>>> clock rate). The GIC counts once per clock. Assuming that the GIC and
>>>> CPU share the same clock this means the GIC is counting twice as fast,
>>>> and so the min delta should be (at least) doubled. Fix this by doubling
>>>> the min delta to 0x600.
>>>>
>>>> Secondly, the fixed min delta ignores the fact that with MIPS
>>>> multithreading active, execution resource within a core is shared
>>>> between the hardware threads within that core. An inconvenienly timed
>>>> switch of executing thread within gic_next_event, between the read and
>>>> write of updated count, can result in the CPU writing an event in the
>>>> past, and subsequently not receiving a tick interrupt until the counter
>>>> wraps. This stalls the CPU from the RCU scheduler. Other CPUs detect
>>>> this and print rcu_sched timeout messages in  the kernel log. It can
>>>> lead to other issues as well if the CPU is holding locks or other
>>>> resources at the point at which it stalls. Fix this by scaling the min
>>>> delta for the timer based on the number of threads in the core
>>>> (smp_num_siblings). This accounts for the greater average runtime of
>>>> CPUs within a multithreading core.
>>>
>>> I don't understand why this is not catched by the check at the end of the
>>> next_event() function:
>>>
>>>         res = ((int)(gic_read_count() - cnt) >= 0) ? -ETIME : 0;
>>>
>>> Btw, the local_irq_save() in this function is pointless as this function is
>>> always called with interrupts disabled from the core code.
>>
>> Would it be worth to add some comment in include/linux/clockchips.h in
>> the structure definition for the different callbacks to tell which ones
>> are called with the irq disabled ?
> 
> Yes. IIRC all callbacks are invoked with interrupts disabled. Care to check
> that and whip up a patch?

Sure, no problem.


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ