[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220825164111.29534-1-zhouchengming@bytedance.com>
Date: Fri, 26 Aug 2022 00:41:01 +0800
From: Chengming Zhou <zhouchengming@...edance.com>
To: hannes@...xchg.org, tj@...nel.org, mkoutny@...e.com,
surenb@...gle.com
Cc: mingo@...hat.com, peterz@...radead.org, gregkh@...uxfoundation.org,
corbet@....net, cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, songmuchun@...edance.com,
Chengming Zhou <zhouchengming@...edance.com>
Subject: [PATCH v4 00/10] sched/psi: some optimizations and extensions
Hi all,
This patch series are some optimizations and extensions for PSI.
patch 1/10 fix periodic aggregation shut off problem introduced by earlier
commit 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups").
patch 2-4 are some misc optimizations, so put them in front of this series.
patch 5/10 optimize task switch inside shared cgroups when in_memstall status
of prev task and next task are different.
patch 6/10 remove NR_ONCPU task accounting to save 4 bytes in the first
cacheline to be used by the following patch 7/10, which introduce new
PSI resource PSI_IRQ to track IRQ/SOFTIRQ pressure stall information.
patch 8-9 cache parent psi_group in struct psi_group to speed up the
hot iteration path.
patch 10/10 introduce a per-cgroup interface "cgroup.pressure" to disable
or re-enable PSI in the cgroup level, and we implement hiding and unhiding
the pressure files per Tejun's suggestion[1], which depends on his work[2].
[1] https://lore.kernel.org/all/YvqjhqJQi2J8RG3X@slm.duckdns.org/
[2] https://lore.kernel.org/all/20220820000550.367085-1-tj@kernel.org/
Performance test using mmtests/config-scheduler-perfpipe in
/user.slice/user-0.slice/session-4.scope:
next patched patched/only-leaf
Min Time 8.82 ( 0.00%) 8.49 ( 3.74%) 8.00 ( 9.32%)
1st-qrtle Time 8.90 ( 0.00%) 8.58 ( 3.63%) 8.05 ( 9.58%)
2nd-qrtle Time 8.94 ( 0.00%) 8.61 ( 3.65%) 8.09 ( 9.50%)
3rd-qrtle Time 8.99 ( 0.00%) 8.65 ( 3.75%) 8.15 ( 9.35%)
Max-1 Time 8.82 ( 0.00%) 8.49 ( 3.74%) 8.00 ( 9.32%)
Max-5 Time 8.82 ( 0.00%) 8.49 ( 3.74%) 8.00 ( 9.32%)
Max-10 Time 8.84 ( 0.00%) 8.55 ( 3.20%) 8.04 ( 9.05%)
Max-90 Time 9.04 ( 0.00%) 8.67 ( 4.10%) 8.18 ( 9.51%)
Max-95 Time 9.04 ( 0.00%) 8.68 ( 4.03%) 8.20 ( 9.26%)
Max-99 Time 9.07 ( 0.00%) 8.73 ( 3.82%) 8.25 ( 9.11%)
Max Time 9.12 ( 0.00%) 8.89 ( 2.54%) 8.27 ( 9.29%)
Amean Time 8.95 ( 0.00%) 8.62 * 3.67%* 8.11 * 9.43%*
Big thanks to Johannes Weiner, Tejun Heo and Michal Koutný for your
suggestions and review!
Changes in v4:
- Collect Acked-by tags from Johannes Weiner.
- Add many clear comments and changelogs per Johannes Weiner.
- Replace for_each_psi_group() with better open-code.
- Change to use better names cgroup_pressure_show() and
cgroup_pressure_write().
- Change to use better name psi_cgroup_restart() and only
call it on enabling.
Changes in v3:
- Rebase on linux-next and reorder patches to put misc optimizations
patches in the front of this series.
- Drop patch "sched/psi: don't change task psi_flags when migrate CPU/group"
since it caused a little performance regression and it's just
code refactoring, so drop it.
- Don't define PSI_IRQ and PSI_IRQ_FULL when !CONFIG_IRQ_TIME_ACCOUNTING,
in which case they are not used.
- Add patch 8/10 "sched/psi: consolidate cgroup_psi()" make cgroup_psi()
can handle all cgroups including root cgroup, make patch 9/10 simpler.
- Rename interface to "cgroup.pressure" and add some explanation
per Michal's suggestion.
- Hide and unhide pressure files when disable/re-enable cgroup PSI,
depends on Tejun's work.
Changes in v2:
- Add Acked-by tags from Johannes Weiner. Thanks for review!
- Fix periodic aggregation wakeup for common ancestors in
psi_task_switch().
- Add patch 7/10 from Johannes Weiner, which remove NR_ONCPU
task accounting to save 4 bytes in the first cacheline.
- Remove "psi_irq=" kernel cmdline parameter in last version.
- Add per-cgroup interface "cgroup.psi" to disable/re-enable
PSI stats accounting in the cgroup level.
Chengming Zhou (9):
sched/psi: fix periodic aggregation shut off
sched/psi: don't create cgroup PSI files when psi_disabled
sched/psi: save percpu memory when !psi_cgroups_enabled
sched/psi: move private helpers to sched/stats.h
sched/psi: optimize task switch inside shared cgroups again
sched/psi: add PSI_IRQ to track IRQ/SOFTIRQ pressure
sched/psi: consolidate cgroup_psi()
sched/psi: cache parent psi_group to speed up groups iterate
sched/psi: per-cgroup PSI accounting disable/re-enable interface
Johannes Weiner (1):
sched/psi: remove NR_ONCPU task accounting
Documentation/admin-guide/cgroup-v2.rst | 23 ++
include/linux/cgroup-defs.h | 3 +
include/linux/cgroup.h | 5 -
include/linux/psi.h | 12 +-
include/linux/psi_types.h | 29 ++-
kernel/cgroup/cgroup.c | 106 ++++++++-
kernel/sched/core.c | 1 +
kernel/sched/psi.c | 280 +++++++++++++++++-------
kernel/sched/stats.h | 6 +
9 files changed, 362 insertions(+), 103 deletions(-)
--
2.37.2
Powered by blists - more mailing lists