linux-kernel - Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1710191020510.1971@nanos>
Date:   Thu, 19 Oct 2017 10:22:11 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Matt Redfearn <matt.redfearn@...s.com>
cc:     James Hogan <james.hogan@...s.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        linux-mips@...ux-mips.org,
        Matt Redfearn <matt.redfearn@...tec.com>,
        "# v3 . 19 +" <stable@...r.kernel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu_sched timeouts
 from multithreading

On Thu, 19 Oct 2017, Matt Redfearn wrote:
> On 18/10/17 21:34, Thomas Gleixner wrote:
> > On Wed, 11 Oct 2017, Matt Redfearn wrote:
> > > Secondly, the fixed min delta ignores the fact that with MIPS
> > > multithreading active, execution resource within a core is shared
> > > between the hardware threads within that core. An inconvenienly timed
> > > switch of executing thread within gic_next_event, between the read and
> > > write of updated count, can result in the CPU writing an event in the
> > > past, and subsequently not receiving a tick interrupt until the counter
> > > wraps. This stalls the CPU from the RCU scheduler. Other CPUs detect
> > > this and print rcu_sched timeout messages in  the kernel log. It can
> > > lead to other issues as well if the CPU is holding locks or other
> > > resources at the point at which it stalls. Fix this by scaling the min
> > > delta for the timer based on the number of threads in the core
> > > (smp_num_siblings). This accounts for the greater average runtime of
> > > CPUs within a multithreading core.
> >
> > I don't understand why this is not catched by the check at the end of the
> > next_event() function:
> > 
> >          res = ((int)(gic_read_count() - cnt) >= 0) ? -ETIME : 0;
> > 
> > Btw, the local_irq_save() in this function is pointless as this function is
> > always called with interrupts disabled from the core code.
> 
> This is an issue because in some cases (hrtimer_reprogram ->
> clockevents_program_event -> clockevents_program_min_delta, when
> CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n) there is no retry performed in the
> case of -ETIME. There has been a patch pending for some time
> https://patchwork.kernel.org/patch/8909491/ which ought to address this and
> retry in the case of an event in the past on this call path. But in the
> meantime this patch vastly improves the situation.

I somehow missed that one. Care to repost so we get that solved at the
place where it should be solved.

Thanks,

	tglx