linux-kernel - Re: INFO: rcu detected stall in do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Y3uvX+Zb9nG_sYfbRXoxD1XR-1PLyitwm-UQyjGv5r1A@mail.gmail.com>
Date:   Sat, 27 Oct 2018 12:16:52 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Juri Lelli <juri.lelli@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        luca abeni <luca.abeni@...tannapisa.it>,
        Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <juri.lelli@...il.com>,
        syzbot <syzbot+385468161961cee80c31@...kaller.appspotmail.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, nstange@...e.de,
        syzkaller-bugs@...glegroups.com, henrik@...tad.us,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: INFO: rcu detected stall in do_idle

On Wed, Oct 24, 2018 at 1:03 PM, Juri Lelli <juri.lelli@...hat.com> wrote:
>
> On 19/10/18 22:50, luca abeni wrote:
> > On Fri, 19 Oct 2018 13:39:42 +0200
> > Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > > On Thu, Oct 18, 2018 at 01:08:11PM +0200, luca abeni wrote:
> > > > Ok, I see the issue now: the problem is that the "while
> > > > (dl_se->runtime <= 0)" loop is executed at replenishment time, but
> > > > the deadline should be postponed at enforcement time.
> > > >
> > > > I mean: in update_curr_dl() we do:
> > > >   dl_se->runtime -= scaled_delta_exec;
> > > >   if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > > >           ...
> > > >           enqueue replenishment timer at dl_next_period(dl_se)
> > > > But dl_next_period() is based on a "wrong" deadline!
> > > >
> > > >
> > > > I think that inserting a
> > > >         while (dl_se->runtime <= -pi_se->dl_runtime) {
> > > >                 dl_se->deadline += pi_se->dl_period;
> > > >                 dl_se->runtime += pi_se->dl_runtime;
> > > >         }
> > > > immediately after "dl_se->runtime -= scaled_delta_exec;" would fix
> > > > the problem, no?
> > >
> > > That certainly makes sense to me.
> >
> > Good; I'll try to work on this idea in the weekend.
>
> So, we (me and Luca) managed to spend some more time on this and found a
> few more things worth sharing. I'll try to summarize what we have got so
> far (including what already discussed) because impression is that each
> point might deserve a fix or at least consideration (just amazing how a
> simple random fuzzer thing can highlight all that :).

1. Fuzzing finds bugs in any code. Always.
If a code wasn't fuzzed, there are bugs.

2. This fuzzer is not so simple ;)


> Apologies for the
> long email.
>
> Reproducer runs on a CONFIG_HZ=100, CONFIG_IRQ_TIME_ACCOUNTING kernel
> and does something like this (only the bits that seems to matter here)
>
> int main(void)
> {
>   [...]
>   [setup stuff at 0x2001d000]
>   syscall(__NR_perf_event_open, 0x2001d000, 0, -1, -1, 0);
>   *(uint32_t*)0x20000000 = 0;
>   *(uint32_t*)0x20000004 = 6;
>   *(uint64_t*)0x20000008 = 0;
>   *(uint32_t*)0x20000010 = 0;
>   *(uint32_t*)0x20000014 = 0;
>   *(uint64_t*)0x20000018 = 0x9917; <-- ~40us
>   *(uint64_t*)0x20000020 = 0xffff; <-- ~65us (~60% bandwidth)
>   *(uint64_t*)0x20000028 = 0;
>   syscall(__NR_sched_setattr, 0, 0x20000000, 0);
>   [busy loop]
>   return 0;
> }
>
> And this causes problems because the task is actually never throttled.
>
> Pain points:
>
>  1. Granularity of enforcement (at each tick) is huge compared with
>     the task runtime. This makes starting the replenishment timer,
>     when runtime is depleted, always to fail (because old deadline
>     is way in the past). So, the task is fully replenished and put
>     back to run.
>
>     - Luca's proposal should help here, since the deadline is postponed
>       at throttling time, and replenishment timer set to that (and it
>       should be in the future)
>
>  1.1 Even if we fix 1. in a configuration like this, the task would
>      still be able to run for ~10ms (worst case) and potentially starve
>      other tasks. It doesn't seem a too big interval maybe, but there
>      might be other very short activities that might miss an occasion
>      to run "quickly".
>
>      - Might be fixed by imposing (via sysctl) reasonable defaults for
>        minimum runtime (w.r.t. HZ, like HZ/2) and maximum for period
>        (as also a very small bandwidth task can have a big runtime if
>        period is big as well)
>
>  (1.2) When runtime becomes very negative (because delta_exec was big)
>        we seem to spend lot of time inside the replenishment loop.
>
>        - Not sure it's such a big problem, might need more profiling.
>          Feeling is that once the other points will be addressed this
>          won't matter anymore
>
>  2. This is related to perf_event_open syscall reproducer does before
>     becoming DEADLINE and entering the busy loop. Enabling of perf
>     swevents generates lot of hrtimers load that happens in the
>     reproducer task context. Now, DEADLINE uses rq_clock() for setting
>     deadlines, but rq_clock_task() for doing runtime enforcement.
>     In a situation like this it seems that the amount of irq pressure
>     becomes pretty big (I'm seeing this on kvm, real hw should maybe do
>     better, pain point remains I guess), so rq_clock() and
>     rq_clock_task() might become more a more skewed w.r.t. each other.
>     Since rq_clock() is only used when setting absolute deadlines for
>     the first time (or when resetting them in certain cases), after a
>     bit the replenishment code will start to see postponed deadlines
>     always in the past w.r.t. rq_clock(). And this brings us back to the
>     fact that the task is never stopped, since it can't keep up with
>     rq_clock().
>
>     - Not sure yet how we want to address this [1]. We could use
>       rq_clock() everywhere, but tasks might be penalized by irq
>       pressure (theoretically this would mandate that irqs are
>       explicitly accounted for I guess). I tried to use the skew between
>       the two clocks to "fix" deadlines, but that puts us at risks of
>       de-synchronizing userspace and kernel views of deadlines.
>
>  3. HRTICK is not started for new entities.
>
>     - Already got a patch for it.
>
> This should be it, I hope. Luca (thanks a lot for your help) and please
> add or correct me if I was wrong.
>
> Thoughts?
>
> Best,
>
> - Juri
>
> 1 - https://elixir.bootlin.com/linux/latest/source/kernel/sched/deadline.c#L1162
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@...glegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20181024120335.GE29272%40localhost.localdomain.
> For more options, visit https://groups.google.com/d/optout.