linux-kernel - Re: [PATCHSET RFC] sched: Implement BPF extensible scheduler class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4984b4f5-7bc5-6109-2523-77265141b3d2@google.com>
Date:   Wed, 14 Dec 2022 18:20:11 -0500
From:   Barret Rhoden <brho@...gle.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Josh Don <joshdon@...gle.com>, torvalds@...ux-foundation.org,
        mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, ast@...nel.org,
        daniel@...earbox.net, andrii@...nel.org, martin.lau@...nel.org,
        pjt@...gle.com, derkling@...gle.com, haoluo@...gle.com,
        dvernet@...a.com, dschatzberg@...a.com, dskarlat@...cmu.edu,
        riel@...riel.com, linux-kernel@...r.kernel.org,
        bpf@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCHSET RFC] sched: Implement BPF extensible scheduler class

On 12/14/22 17:23, Tejun Heo wrote:
> Google guys probably have a lot to say here too and there may be many
> commonalties, but here's how things are on our end.

your email pretty much captures my experiences from the google side.  in 
fact, i think i'll save it for the next time someone asks me to 
summarize the challenges with both kernel rollouts and testing changes 
on workloads.  =)

>> I was given to believe this was a fairly rapid process.
> 
> Going back to the first phase where we're experimenting in a more controlled
> environment. Yes, that is a faster process but only in comparison to the
> second phase. Some controlled experiments, the faster ones, usually take
> several hours to obtain a meaningful result. It just takes a while for
> production workloads to start, jit-compile all the hot code paths, warm up
> caches and so on. Others, unfortunately, take a lot longer to ramp up to the
> degree whether it can be compared against production numbers. Some of the
> benchmarks stretch multiple days.
> 
> With SCX, we can keep just keep hotswapping and tuning the scheduler
> behavior getting results in tens of minutes instead of multiple hours and
> without worrying about crashing the test machines

for testing sched policies on one of our bigger apps, the O(hours) 
kernel reboot vs O(minutes) reload of a BPF scheduler is a pain.  but 
that's only for a single machine; it can be much worse on a full cluster.

full-cluster tests are a different beast.  we are one of many groups 
that want to do testing, and we have to reserve a time on their cluster. 
  but to change the kernel, it actually took us weeks to coordinate an 
kernel change on the app's large testing cluster - essentially since we 
were using an unqualified kernel, we 'blocked' all of the other testing.

> it's way easier and faster to have a running test environment setup and
> iterate through scheduling behavior changes without worrying about crashing
> the machine than having to cycle and re-setup test setup for each iteration.

i'm a newcomer to BPF, but for me the "interaction with live machine" is 
a major BPF feature, both in SCX and also more broadly with the various 
tracing tools and other BPF uses.  (not to mention the per-workload or 
per-machine customization that BPF enables, but that's a separate 
discussion).

thanks,

barret