[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABk29Ntf1ZMAmvkVTzj6=HjanHgn6Qu3-J8gHHyMM30yiHM3_w@mail.gmail.com>
Date: Tue, 13 Dec 2022 18:11:38 -0800
From: Josh Don <joshdon@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Tejun Heo <tj@...nel.org>, torvalds@...ux-foundation.org,
mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vschneid@...hat.com, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, martin.lau@...nel.org,
brho@...gle.com, pjt@...gle.com, derkling@...gle.com,
haoluo@...gle.com, dvernet@...a.com, dschatzberg@...a.com,
dskarlat@...cmu.edu, riel@...riel.com,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
kernel-team@...a.com
Subject: Re: [PATCHSET RFC] sched: Implement BPF extensible scheduler class
On Mon, Dec 12, 2022 at 2:14 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Nov 29, 2022 at 10:22:42PM -1000, Tejun Heo wrote:
>
> > Rolling out kernel upgrades is a slow and iterative process. At a large scale
> > it can take months to roll a new kernel out to a fleet of servers. While this
> > latency is expected and inevitable for normal kernel upgrades, it can become
> > highly problematic when kernel changes are required to fix bugs. Livepatch [9]
> > is available to quickly roll out critical security fixes to large fleets, but
> > the scope of changes that can be applied with livepatching is fairly limited,
> > and would likely not be usable for patching scheduling policies. With
> > sched_ext, new scheduling policies can be rapidly rolled out to production
> > environments.
>
> I don't think we can or should use this argument to push BPF into ever
> more places.
Improving scheduling performance requires rapid iteration to explore
new policies and tune parameters, especially as hardware becomes more
heterogeneous, and applications become more complex. Waiting months
between evaluating scheduler policy changes is simply not scalable,
but this is the reality with large fleets that require time for
testing, qualification, and progressive rollout. The security angle
should be clear from how involved it was to integrate core scheduling,
for example.
Powered by blists - more mailing lists