lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231003210511.GB30978@maniforge>
Date:   Tue, 3 Oct 2023 16:05:11 -0500
From:   David Vernet <void@...ifault.com>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     K Prateek Nayak <kprateek.nayak@....com>,
        linux-kernel@...r.kernel.org, peterz@...radead.org,
        mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, tj@...nel.org,
        roman.gushchin@...ux.dev, gautham.shenoy@....com,
        aaron.lu@...el.com, wuyun.abel@...edance.com, kernel-team@...a.com
Subject: Re: [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag

On Wed, Sep 27, 2023 at 02:59:29PM +0800, Chen Yu wrote:
> Hi Prateek,

Hi Chenyu,

> On 2023-09-27 at 09:53:13 +0530, K Prateek Nayak wrote:
> > Hello David,
> > 
> > Some more test results (although this might be slightly irrelevant with
> > next version around the corner)
> > 
> > On 9/1/2023 12:41 AM, David Vernet wrote:
> > > On Thu, Aug 31, 2023 at 04:15:08PM +0530, K Prateek Nayak wrote:
> > > 
> > -> With EEVDF
> > 
> > o tl;dr
> > 
> > - Same as what was observed without EEVDF  but shared_runq shows
> >   serious regression with multiple more variants of tbench and
> >   netperf now.
> > 
> > o Kernels
> > 
> > eevdf			: tip:sched/core at commit b41bbb33cf75 ("Merge branch 'sched/eevdf' into sched/core")
> > shared_runq		: eevdf + correct time accounting with v3 of the series without any other changes
> > shared_runq_idle_check	: shared_runq + move the rq->avg_idle check before peeking into the shared_runq
> > 			  (the rd->overload check still remains below the shared_runq access)
> >
> 
> I did not see any obvious regression on a Sapphire Rapids server and it seems that
> the result on your platform suggests that C/S workload could be impacted
> by shared_runq. Meanwhile some individual workloads like HHVM in David's environment
> (no shared resource between tasks if I understand correctly) could benefit from

Correct, hhvmworkers are largely independent, though they do sometimes
synchronize, and they also sometimes rely on I/O happening in other
tasks.

> shared_runq a lot. This makes me wonder if we can let shared_runq skip the C/S tasks.

I'm also open to this possibility, but I worry that we'd be going down
the same rabbit hole as what fair.c does already, which is use
heuristics to determine when something should or shouldn't be migrated,
etc. I really do feel that there's value in SHARED_RUNQ providing
consistent and predictable work conservation behavior.

On the other hand, it's clear that there are things we can do to improve
performance for some of these client/server workloads that hammer the
runqueue on larger CCXs / sockets. If we can avoid those regressions
while still having reasonably high confidence that work conservation
won't disproportionately suffer, I'm open to us making some tradeoffs
and/or adding a bit of complexity to avoid some of this unnecessary
contention.

I think it's probably about time for v4 to be sent out. What do you
folks think about including:

1. A few various fixes / tweaks from v3, e.g. avoiding using the wrong
   shard on the task_dead_fair() path if the feature is disabled before
   a dying task is dequeued from a shard, fixing the build issues
   pointed out by lkp, etc.
2. Fix the issue that Prateek pointed out in [0] where we're not
   properly skipping the LLC domain due to using the for_each_domain()
   macro (this is also addressed by (3)).
3. Apply Prateek's suggestions (in some form) in [1] and [2]. For [2],
   I'm inclined to just avoid enqueuing a task on a shard if the rq it's on
   has nr_running == 0. Or, we can just add his patch to the series
   directly if it turns out that just looking at rq->nr_running is
   insufficient.

[0]: https://lore.kernel.org/all/3e32bec6-5e59-c66a-7676-7d15df2c961c@amd.com/
[1]: https://lore.kernel.org/all/20230831104508.7619-3-kprateek.nayak@amd.com/
[2]: https://lore.kernel.org/all/20230831104508.7619-4-kprateek.nayak@amd.com/

Prateek -- what do you think about this? I want to make sure you get
credit for your contributions to this series, so let me know how you'd
like to apply these changes. [1] essentially just improves much of the
logic from [3], so I'm not sure it would make sense to include it as a
separate patch. I'm happy to include a Co-authored-by tag, or to just
explicitly credit your contributions in the commit summary if you'd
prefer that.

[3]: https://lore.kernel.org/all/20230809221218.163894-7-void@manifault.com/

Thanks,
David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ