linux-kernel - Re: [PATCH 2/2] sched: adjust SCHED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29NvOFsZYq6C0h4hRpd=CvoVxtYypgTx9neG4SKzUMQy2Tw@mail.gmail.com>
Date:   Fri, 13 Aug 2021 16:55:30 -0700
From:   Josh Don <joshdon@...gle.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Paul Turner <pjt@...gle.com>,
        Oleg Rombakh <olegrom@...gle.com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Steve Sistare <steven.sistare@...cle.com>,
        Tejun Heo <tj@...nel.org>, Rik van Riel <riel@...riel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched: adjust SCHED_IDLE interactions

On Fri, Aug 13, 2021 at 5:43 AM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
[snip]
> > >
> > > The 1ms of your test comes from the tick which could be a good
> > > candidate for a min value or the
> > > normalized_sysctl_sched_min_granularity which has the advantage of not
> > > increasing with number of CPU
> >
> > Fair point, this shouldn't completely ignore min granularity. Something like
> >
> > unsigned int sysctl_sched_idle_min_granularity = NSEC_PER_MSEC;
> >
> > (and still only using this value instead of the default
> > min_granularity when the SCHED_IDLE entity is competing with normal
> > entities)
>
> Yes that looks like a good option
>
> Also note that with a NSEC_PER_MSEC default value, the sched_idle
> entity will most probably run 2 ticks instead of the 1 tick (HZ=1000)
> that you have with your proposal because a bit less than a full tick
> is accounted to the running thread (the time spent in interrupt is not
> accounted as an example) so sysctl_sched_idle_min_granularity of 1ms
> with HZ=1000 will most propably run 2 ticks. Instead you could reuse
> the default 750000ULL value of sched_idle_min_granularity

Yes, great point. That's a better value here, with sufficient margin.

> That being said sysctl_sched_idle_min_granularity =
> normalized_sysctl_sched_min_granularity * scale_factor which means
> that normalized_sysctl_sched_min_granularity stays the same
> (750000ULL) whatever the number of cpus
>
> >
> > > > @@ -4216,7 +4228,15 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > > >                 if (sched_feat(GENTLE_FAIR_SLEEPERS))
> > > >                         thresh >>= 1;
> > > >
> > > > -               vruntime -= thresh;
> > > > +               /*
> > > > +                * Don't give sleep credit to a SCHED_IDLE entity if we're
> > > > +                * placing it onto a cfs_rq with non SCHED_IDLE entities.
> > > > +                */
> > > > +               if (!se_is_idle(se) ||
> > > > +                   cfs_rq->h_nr_running == cfs_rq->idle_h_nr_running)
> > >
> > > Can't this condition above create unfairness between idle entities ?
> > > idle thread 1 wake up while normal thread is running
> > > normal thread thread sleeps immediately after
> > > idle thread 2 wakes up just after and gets some credits compared to the 1st one.
> >
> > Yes, this sacrifices some idle<->idle fairness when there is a normal
> > thread that comes and goes. One alternative is to simply further
> > reduce thresh for idle entities. That will interfere with idle<->idle
> > fairness when there are no normal threads, which is why I opted for
> > the former. On second thought though, the former fairness issue seems
> > more problematic. Thoughts on applying a smaller sleep credit
> > threshold universally to idle entities?
>
> This one is a bit more complex to set.
> With adding 1, you favor the already runnable tasks by ensuring that
> they have or will run a slice during this period before sched_idle
> task
> But as soon as you subtract something to min_vruntime, the task will
> most probably be scheduled at the next tick if other tasks already run
> for a while (at least a sched period). If we use
> sysctl_sched_min_granularity for sched_idle tasks that wake up instead
> of sysctl_sched_latency, we will ensure that a sched_idle task will
> not preempt a normal task, which woke up few ms before, and we will
> keep some fairness for sched_idle task that sleeps compare to other.
>
> so a thresh of sysctl_sched_min_granularity (3.75ms with 16 cpus )
> should not disturb your UC and keep some benefit for newly wake up
> sched_ide task

If the normal task has already been running for at least a period, it
should be ok to preempt.
A thresh around the min_granularity seems like a good order of
magnitude; I'll experiment a bit.