linux-kernel - Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1710191117010.1971@nanos>
Date:   Thu, 19 Oct 2017 11:18:58 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
cc:     Matt Redfearn <matt.redfearn@...s.com>, linux-mips@...ux-mips.org,
        Matt Redfearn <matt.redfearn@...tec.com>,
        "# v3 . 19 +" <stable@...r.kernel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu_sched timeouts
 from multithreading

On Thu, 19 Oct 2017, Daniel Lezcano wrote:
> On 18/10/2017 22:34, Thomas Gleixner wrote:
> > On Wed, 11 Oct 2017, Matt Redfearn wrote:
> > 
> >> When the MIPS GIC clockevent code was written, it appears to have
> >> inherited the 0x300 cycle min delta from the MIPS CPU timer driver. This
> >> is suboptimal for two reasons.
> >>
> >> Firstly, the CPU timer counts once every other cycle (i.e. half the
> >> clock rate). The GIC counts once per clock. Assuming that the GIC and
> >> CPU share the same clock this means the GIC is counting twice as fast,
> >> and so the min delta should be (at least) doubled. Fix this by doubling
> >> the min delta to 0x600.
> >>
> >> Secondly, the fixed min delta ignores the fact that with MIPS
> >> multithreading active, execution resource within a core is shared
> >> between the hardware threads within that core. An inconvenienly timed
> >> switch of executing thread within gic_next_event, between the read and
> >> write of updated count, can result in the CPU writing an event in the
> >> past, and subsequently not receiving a tick interrupt until the counter
> >> wraps. This stalls the CPU from the RCU scheduler. Other CPUs detect
> >> this and print rcu_sched timeout messages in  the kernel log. It can
> >> lead to other issues as well if the CPU is holding locks or other
> >> resources at the point at which it stalls. Fix this by scaling the min
> >> delta for the timer based on the number of threads in the core
> >> (smp_num_siblings). This accounts for the greater average runtime of
> >> CPUs within a multithreading core.
> > 
> > I don't understand why this is not catched by the check at the end of the
> > next_event() function:
> > 
> >         res = ((int)(gic_read_count() - cnt) >= 0) ? -ETIME : 0;
> > 
> > Btw, the local_irq_save() in this function is pointless as this function is
> > always called with interrupts disabled from the core code.
> 
> Would it be worth to add some comment in include/linux/clockchips.h in
> the structure definition for the different callbacks to tell which ones
> are called with the irq disabled ?

Yes. IIRC all callbacks are invoked with interrupts disabled. Care to check
that and whip up a patch?

Thanks,

	tglx