lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Jul 2023 11:11:10 -0400
From:   Barret Rhoden <brho@...gle.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     torvalds@...ux-foundation.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, ast@...nel.org,
        daniel@...earbox.net, andrii@...nel.org, martin.lau@...nel.org,
        joshdon@...gle.com, pjt@...gle.com, derkling@...gle.com,
        haoluo@...gle.com, dvernet@...a.com, dschatzberg@...a.com,
        dskarlat@...cmu.edu, riel@...riel.com,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
        kernel-team@...a.com
Subject: Re: [PATCHSET v4] sched: Implement BPF extensible scheduler class

Hi -

On 7/21/23 14:37, Tejun Heo wrote:
> Hello,
> 
> It's been more than half a year since the initial posting of the patchset
> and we are now at the fourth iteration. There have been some reviews around
> specifics (should be all addressed now except for the ones Andrea raised on
> this iteration) but none at high level. There also were some in-person and
> off-list discussions. Some, I believe, are addressed by the cover letter but
> it'd be nonetheless useful to delve into them on-list.
> 
> On our side, we've been diligently experimenting. 

On the google side, we're still experimenting and developing schedulers 
based on ghost, which we think we can port over to sched_ext.

Specifically, I've been working on a framework to write multicore 
schedulers in BPF called 'Flux'.  The idea in brief is to compose a 
scheduler as a hierarchy of "subschedulers", where cpus allocations go 
up and down the tree.

Flux is open-source, but it needs the ghost kernel and our BPF 
extensions currently (which are also open source, but harder to use for 
people).  I'll send a proposal to talk about it at LPC in case people 
are interested - if not the scheduler framework itself, then as a "this 
is some crazy stuff people can do with BPF".

As far as results go, I wrote a custom scheduler with Flux for our 
Search app and have been testing it on our single-leaf loadtester.  The 
initial results out of the box were pretty great: 17% QPS increase, 43% 
p99 decrease (default settings for the loadtester).  But the loadtester 
varies a bit, so it's hard to get reliable numbers out of it for an A/B 
comparison of schedulers.  Overall, we run equal or better than CFS.  I 
did a sweep across various offered loads, and we got 5% better QPS on 
average, 30% better p99 latency, 6% lower utilization.  The better 
numbers come under higher load, as you'd expect, when there are more 
threads competing for the cpu.

The big caveat to those numbers is the single-leaf loadtester isn't a 
representative test.  It's more of a microbenchmark.  Our next step is 
to run a full cluster load test, which will give us a better signal.

Anyway, this scheduler is highly specific to our app, including shared 
memory regions where the app's threads can tell us stuff like RPC 
deadlines.  It's the sort of thing you could only reasonably do with a 
pluggable scheduler like sched_ext or ghost.


> We are comfortable with the current API. Everything we tried fit pretty
> well. It will continue to evolve but sched_ext now seems mature enough for
> initial inclusion. I suppose lack of response doesn't indicate tacit
> agreement from everyone, so what are you guys all thinking?

Btw, I backported your patchset to our "franken-kernel".  I was able to 
boot it on one of our nodes, and run the search loadtest on CFS. 
Nothing broke, performance was the same, etc.  Not a huge surprise, 
since I didn't turn on sched_ext.  I haven't been able to get a 
sched_ext scheduler to work yet with our kernel - there's more patch 
backporting needed for your schedulers themselves (the bpf_for iterators 
and whatnot).  I'll report back if/when I can get it running.

Thanks,

Barret


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ