lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6Z_CNBMp5greLf4@gpd3>
Date: Fri, 7 Feb 2025 22:45:44 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: Changwoo Min <changwoo@...lia.com>, void@...ifault.com,
	kernel-dev@...lia.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] sched_ext: Add a core event and update scx schedulers

On Fri, Feb 07, 2025 at 11:38:31AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 07, 2025 at 07:24:08AM +0100, Andrea Righi wrote:
> > On Fri, Feb 07, 2025 at 12:13:36PM +0900, Changwoo Min wrote:
> > > This patchset introduces a new event, SCX_EV_ENQ_SLICE_DFL, and updates
> > > two scx schedulers -- scx_qmap and scx_central -- to print out the new
> > > event.
> > > 
> > > SCX_EV_ENQ_SLICE_DFL counts how many times the tasks' time slice is set
> > > to the default value (SCX_SLICE_DFL) by the sched_ext core in the enqueue
> > > and pick_next paths.
> > > 
> > > Scheduling a task with SCX_SLICE_DFL unintentionally would be a source
> > > of latency spikes because SCX_SLICE_DFL is relatively long (20 msec).
> > > Thus, soaring the SCX_EV_ENQ_SLICE_DFL value would be a sign of BPF
> > > scheduler bugs, causing latency spikes.
> > 
> > Not directly related to this patch set, but as a general thought: would it
> > be useful to introduce ops->slice_ms (in sched_ext_ops) to override
> > SCX_SLICE_DFL?
> > 
> > With that, schedulers that care about latency could set a smaller default
> > time slice to prevent potential spikes caused by the implicit use of
> > SCX_SLICE_DFL.
> > 
> > Opinions?
> 
> I'm not sure. BPF schedulers should be able to avoid getting the default
> slice. Hopefully, with the added visibility, this should be easier now. I'm
> not sure how much overriding the default value in ops helps in terms of
> control. It's a very half-way measure. Instead, how about we add tracepoint
> to scx_add_event() so that folks who want to get backtrace of specific
> events can get them easily so that it's easier to debug where these counts
> are coming from? Let's just make it easier to avoid these events.

Yeah, that's a valid point, the implicit SCX_SLICE_DFL should be seen as a
countermeasure for unhandled situations. Instead of fixing the
countermeasure itself we should try to prevent it, if it proves to be
problematic. And I like the idea of having a way to backtrace specific
events.

-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ