[<prev] [next>] [day] [month] [year] [list]
Message-ID: <bd95a0f0-5589-2d9e-8fb0-a66322e556e4@scylladb.com>
Date: Wed, 30 Mar 2022 14:01:21 +0300
From: Avi Kivity <avi@...lladb.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Asias He <asias@...lladb.com>, linux-kernel@...r.kernel.org
Subject: sched_min_granuality_ns exile into debugfs
Hi Peter,
In 8a99b683 ("sched: Move SCHED_DEBUG sysctl to debugfs"), you moved
sched_min_granularity_ns to debugfs, citing that it is debug-only (true)
and undocumented (it is documented in sched-design-CFS.rst, under
the old name).
This breaks my application, Scylla[1]. We use sched_min_granularity_ns
to reduce the chances that a high networking backlog will starve the
application thread. It is a thread-per-core design, so we won't find another
core for the application, they are all busy (and besides, the application
threads are pinned).
In addition to sched_min_granularity_ns, we also tune a few other
sysctls:
# Prevent auto-scaling from doing anything to our tunables
kernel.sched_tunable_scaling = 0
# Preempt sooner
kernel.sched_min_granularity_ns = 500000
# Don't delay unrelated workloads
kernel.sched_wakeup_granularity_ns = 450000
# Schedule all tasks in this period
kernel.sched_latency_ns = 1000000
# autogroup seems to prevent sched_latency_ns from being respected
kernel.sched_autogroup_enabled = 0
# Disable numa balancing
kernel.numa_balancing = 0
While we can adapt to the move, I would much prefer it if the old location
was restored. I think it even makes sense to make this a non-debug tunable;
it helps to application to be more responsive without using the realtime
class, which is its own can of worms (and will likely result in reduced
throughput).
[1] https://github.com/scylladb/scylla
Powered by blists - more mailing lists