lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181018122331.50ed3212@luca64>
Date:   Thu, 18 Oct 2018 12:23:31 +0200
From:   luca abeni <luca.abeni@...tannapisa.it>
To:     Juri Lelli <juri.lelli@...hat.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <juri.lelli@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        syzbot <syzbot+385468161961cee80c31@...kaller.appspotmail.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        LKML <linux-kernel@...r.kernel.org>, mingo@...hat.com,
        nstange@...e.de, syzkaller-bugs@...glegroups.com, henrik@...tad.us,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: INFO: rcu detected stall in do_idle

Hi Juri,

On Thu, 18 Oct 2018 10:28:38 +0200
Juri Lelli <juri.lelli@...hat.com> wrote:
[...]
>  struct sched_attr {
>     .size	= 0,
>     .policy	= 6,
>     .flags	= 0,
>     .nice	= 0,
>     .priority	= 0,
>     .runtime	= 0x9917,
>     .deadline	= 0xffff,
>     .period	= 0,
>  }
> 
> So, we seem to be correctly (in theory, see below) accepting the task.
> 
> What seems to generate the problem here is that CONFIG_HZ=100 and
> reproducer task has "tiny" runtime (~40us) and deadline (~66us)
> parameters, combination that "bypasses" the enforcing mechanism
> (performed at each tick).

Ok, so the task can execute for at most 1 tick before being throttled...
Which does not look too bad.

I missed the original emails, but maybe the issue is that the task
blocks before the tick, and when it wakes up again something goes wrong
with the deadline and runtime assignment? (maybe because the deadline
is in the past?)


> Another side problem seems also to be that with such tiny parameters
> we spend lot of time in the while (dl_se->runtime <= 0) loop of
> replenish_dl_ entity() (actually uselessly, as deadline is most
> probably going to still be in the past when eventually runtime
> becomes positive again), as delta_exec is huge w.r.t. runtime and
> runtime has to keep up with tiny increments of dl_runtime. I guess we
> could ameliorate things here by limiting the number of time we
> execute the loop before bailing out.

Actually, I think the loop will iterate at most 10ms / 39us times, which
is about 256 times, right? If this is too much (I do not know how much
time it is spent executing the loop), then the solution is (as you
suggest) to increase the minimum allowed runtime.

[...]
> So, I tend to think that we might want to play safe and put some
> higher minimum value for dl_runtime (it's currently at 1ULL <<
> DL_SCALE). Guess the problem is to pick a reasonable value, though.
> Maybe link it someway to HZ?

Yes, a value dependent on HZ looks like a good idea. I would propose
HZ / N, where N is the maximum number of times you want the loop above
to be executed.


> Then we might add a sysctl (or similar)
> thing with which knowledgeable users can do whatever they think their
> platform/config can support?

I guess this can be related to the utilization limits we were
discussing some time ago... I would propose a cgroup-based interface to
set all of these limits.



				Luca

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ