lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+q576PvB2-rCMeC6SiONLz4s+9JeAwMtq0vK=SQnR+WicQtew@mail.gmail.com>
Date:   Mon, 2 Oct 2023 10:55:07 -0500
From:   Youssef Esmat <youssefesmat@...omium.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Daniel Jordan <daniel.m.jordan@...cle.com>, mingo@...nel.org,
        vincent.guittot@...aro.org, linux-kernel@...r.kernel.org,
        juri.lelli@...hat.com, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, corbet@....net, qyousef@...alina.io,
        chris.hyser@...cle.com, patrick.bellasi@...bug.net, pjt@...gle.com,
        pavel@....cz, qperret@...gle.com, tim.c.chen@...ux.intel.com,
        joshdon@...gle.com, timj@....org, kprateek.nayak@....com,
        yu.c.chen@...el.com, joel@...lfernandes.org, efault@....de,
        tglx@...utronix.de
Subject: Re: [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr

On Fri, Sep 29, 2023 at 11:54 AM Youssef Esmat
<youssefesmat@...omium.org> wrote:
>
> >
> > EEVDF fundamentally supports per task request/slice sizes, which is the
> > primary motivator for finally finishing these patches.
> >
> > So the plan is to extend sched_setattr() to allow tasks setting their
> > own ideal slice length. But we're not quite there yet.
> >
> > Having just returned from PTO the mailbox is an utter trainwreck, but
> > I'll try and refresh those few patches this week for consideration.
> >
> > In the meantime I think you found the right knob to twiddle.
>
> Hello Peter,
>
> I am trying to understand a little better the need for the eligibility
> checks (entity_eligible). I understand the general concept, but I am
> trying to find a scenario where it is necessary. And maybe propose to
> have it toggled by a feature flag.
>
> Some of my testing:
>
> All my testing was done on a two core Celeron N400 cpu system 1.1Ghz.
> It was done on the 6.5-rc3 kernel with EEVDF changes ported.
>
> I have two CPU bound tasks one with a nice of -4 and the other with a
> nice of 0. They are both affinitized to CPU 0. (while 1 { i++ })
>
> With entity_eligible *enabled* and with entity_eligible *disabled*
> (always returns 1):
> Top showed consistent results, one at ~70% and the other at ~30%
>
> So it seems the deadline adjustment will naturally achieve fairness.
>
> I also added a few trace_printks to see if there is a case where
> entity_eligible would have returned 0 before the deadline forced us to
> reschedule. There were a few such cases. The following snippet of
> prints shows that an entity became ineligible 2 slices before its
> deadline expired. It seems like this will add more context switching
> but still achieve a similar result at the end.
>
> bprint:               pick_eevdf: eligibility check: tid=4568,
> eligible=0, deadline=26577257249, vruntime=26575761118
> bprint:               pick_eevdf: found best deadline: tid=4573,
> deadline=26575451399, vruntime=26574838855
> sched_switch:         prev_comm=loop prev_pid=4568 prev_prio=120
> prev_state=R ==> next_comm=loop next_pid=4573 next_prio=116
> bputs:                task_tick_fair: tick
> bputs:                task_tick_fair: tick
> bprint:               pick_eevdf: eligibility check: tid=4573,
> eligible=1, deadline=26576270304, vruntime=26575657159
> bprint:               pick_eevdf: found best deadline: tid=4573,
> deadline=26576270304, vruntime=26575657159
> bputs:                task_tick_fair: tick
> bputs:                task_tick_fair: tick
> bprint:               pick_eevdf: eligibility check: tid=4573,
> eligible=0, deadline=26577089170, vruntime=26576476006
> bprint:               pick_eevdf: found best deadline: tid=4573,
> deadline=26577089170, vruntime=26576476006
> bputs:                task_tick_fair: tick
> bputs:                task_tick_fair: tick
> bprint:               pick_eevdf: eligibility check: tid=4573,
> eligible=0, deadline=26577908042, vruntime=26577294838
> bprint:               pick_eevdf: found best deadline: tid=4568,
> deadline=26577257249, vruntime=26575761118
> sched_switch:         prev_comm=loop prev_pid=4573 prev_prio=116
> prev_state=R ==> next_comm=loop next_pid=4568 next_prio=120
>
> In a more practical example, I tried this with one of our benchmarks
> that involves running Meet and Docs side by side and measuring the
> input latency in the Docs document. The following is the average
> latency for 5 runs:
>
> (These numbers are after removing our cgroup hierarchy - that might be
> a discussion for a later time).
>
> CFS: 168ms
> EEVDF with eligibility: 206ms (regression from CFS)
> EEVDF *without* eligibility: 143ms (improvement to CFS)
> EEVDF *without* eligibility and with a 6ms base_slice_ns (was 1.5ms):
> 104ms (great improvement)
>
> Removing the eligibility check for this workload seemed to result in a
> great improvement. I haven't dug deeper but I suspect it's related to
> reduced context switches on our 2 core system.
> As an extra test I also increased the base_slice_ns and it further
> improved the input latency significantly.
>
> I would love to hear your thoughts. Thanks!

For completeness I ran two more tests:

1. EEVDF with eligibility and 6ms base_slice_ns.
2. EEVDF with eligibility with Benjamin Segall's patch
(https://lore.kernel.org/all/xm261qego72d.fsf_-_@google.com/).

I copied over all the previous results for easier comparison.

CFS:                                          168ms
EEVDF, eligib, 1.5ms slice:        206ms
EEVDF, eligib, 6ms slice:           167ms
EEVDF_Fix, eligib, 1.5ms slice: 190ms
EEVDF, *no*eligib, 1.5ms slice: 143ms
EEVDF, *no*eligib, 6ms slice:    104ms

It does seem like Benjamin's fix did have an improvement.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ