linux-kernel - Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1609261936130.4915@nanos>
Date:   Mon, 26 Sep 2016 19:55:13 -0400 (EDT)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Rich Felker <dalias@...c.org>
cc:     Daniel Lezcano <daniel.lezcano@...aro.org>,
        devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-sh@...r.kernel.org, Rob Herring <robh+dt@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource
 driver

On Mon, 26 Sep 2016, Rich Felker wrote:
> On Mon, Sep 26, 2016 at 11:27:14PM +0200, Daniel Lezcano wrote:
> Based on use of ftrace, I was able to see situations where a second
> timer hardirq happened immediately after one occurred, before the
> timer softirq could run. My theory is that this is causing some kind
> of feedback loop where new timer expirations keep getting scheduled
> with a very short interval such that the softirq never gets to run
> (until other interrupt activity disrups the feedback loop). I tried
> reverting 4e85876a9d2a977b4a07389da8c07edf76d10825 which seemed
> relevant and it didn't help, but on further review (right now) there
> seem to be a few related commits just before it that might be
> responsible for the regression. I'll see if I can dig up anything else
> useful.

Interesting theory, but lets look at the data from the other thread:

         <idle>-0     [001] d.h.  2646.702790: irq_handler_entry: irq=72 name=jcore_pit
          <idle>-0     [001] d.h.  2646.703398: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h.  2646.703607: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h.  2646.703884: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] d.h.  2646.704191: irq_handler_exit: irq=72 ret=handled

So it takes 1.4ms to finish the timer irq handler

          <idle>-0     [001] d.H.  2646.704509: irq_handler_entry: irq=72 name=jcore_pit
          <idle>-0     [001] d.H.  2646.704990: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.H.  2646.705182: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.H.  2646.705465: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] d.H.  2646.705761: irq_handler_exit: irq=72 ret=handled

And 1.2ms in this case
	  

          <idle>-0     [001] d.H.  2646.706071: irq_handler_entry: irq=72 name=jcore_pit
          <idle>-0     [001] d.H.  2646.706328: irq_handler_exit: irq=72 ret=handled

So this one is actually short enough that the soft interrupt can excute and
obviously no new timer is scheduled close enough.

          <idle>-0     [001] ..s.  2646.706530: softirq_entry: vec=1 [action=TIMER]

Now the obvious question arises: Why is that handler taking that long? And
sure the next event being programmed shortely after the other has to do
with various factors (HZ setting, High-resolution timers enabled?). W/o
having the corresponding hrtimer tracepoints available (start and expire)
it's hard to tell what's going on.

But I can tell for sure that there is no feedback loop and its not
disrupted by some other interrupt. The above trace is merily seing the PIT
irq and nothing else.

Can you please provide a trace with hrtimer tracepoints enabled and tell us
the settings of CONFIG_HZ and whether high resolution timers are enabled or
not.

That workaround of setting min_delta to some absurd big number looks fishy
at best.

Thanks,

	tglx