linux-kernel - Re: [PATCH V3 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250210225534.efd4onoo7mhodqhn@airbuntu>
Date: Mon, 10 Feb 2025 22:55:34 +0000
From: Qais Yousef <qyousef@...alina.io>
To: zihan zhou <15645113830zzh@...il.com>
Cc: bsegall@...gle.com, dietmar.eggemann@....com, juri.lelli@...hat.com,
	linux-kernel@...r.kernel.org, mgorman@...e.de, mingo@...hat.com,
	peterz@...radead.org, rostedt@...dmis.org,
	vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [PATCH V3 1/2] sched: Reduce the default slice to avoid tasks
 getting an extra tick

On 02/10/25 14:18, zihan zhou wrote:
> Thank you for your comments!
> 
> > I brought the topic up of these magic values with Peter and Vincent in LPC as
> > I think this logic is confusing. I have nothing against your patch, but if the
> > maintainers agree I am in favour of removing it completely in favour of setting
> > it to a single value that is the same across all systems.
> 
> Here is my shallow understanding:
> I think when the number of cpus is small, this type of machine is usually
> a desktop. If the slice is still relatively large, a task has to have a
> longer wake-up delay, which may result in a poorer user experience. When
> there are a large number of cpus, it is likely to mean that the machine is
> a server, its tasks often are batch workloads, a slight increase in slice
> is acceptable. And a server often has idle cpus or cpus with low load, So
> even if there are larger slice, the interaction experience is also not bad.

I think the logic has served its purpose and it's time to retire it. Any larger
than 8 CPUs will have the same mapping anyway. So let's simplify and make it
3ms by default for everyone.

So the suggestion is to remove this logic and always set base_slice to 3ms for
all systems instead. No need to do the scaling anymore.

> 
> > I do think 1ms makes more sense as a default value given how modern workloads
> > need faster responsiveness across the board. But keeping it 3ms to avoid much
> > disturbance would be fine. We could also make it equal to TICK_MSEC (this
> > define doesn't exist) if it is higher than 3ms.
> 
> I don't quite understand this. What is TICK_MSEC? If HZ=1000, then
> TICK_MSEC=1ms? Why is it said that more than 3ms (slice) equals 1ms (tick)?

I meant

	base_slice = max(3ms, TICK_USEC * USEC_PER_MSEC)

I was lazy to type TICK_USEC * USEC_PER_MSEC and used TICK_MSEC instead.

But this is a bad idea. Please ignore it. With HZ=100 still selectable, doing
that will wreck havoc on wake up preemption on those systems.

> 
> It seems that this value was originally designed for
> sysctl_sched_wakeup_granularity. CFS does not force tasks to switch after
> running for this time, but EEVDF does require it, So if slice is too small
> like 1ms, it looks not conducive to cache, and is not good for batch
> workloads.
> 
> > Do you use HZ=100 by the way? If yes, are you able to share the reasons? This
> > configuration is too aggressive and bad for latencies and I doubt this tweak of
> > the formula will make things better to them anyway.
> 
> I don't use HZ=100, in fact, all the machines I use have HZ=1000 and more
> than 8 cpus, so I'm not familiar with some scenarios.
> 
> I think that if the slice is smaller than tick (10ms), there is not much
> difference between 3ms slice and 1ms slice in tick preemption, but the two
> are still different in wake-up preemption. After all, when waking up
> preemption, there also has update_curr->update_deadline, and the wake-up
> latency should be slightly lower with 1ms slice. So I think, when HZ=100,
> different slices still have an impact on latency.

I am trying to argue elsewhere to remove HZ=100. Just was curious if you
actually use this value and if yes why. Sorry a bit of a tangent :)


Thanks!

--
Qais Yousef