[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZJNaOumMiGSqK2-2@slm.duckdns.org>
Date: Wed, 21 Jun 2023 10:14:50 -1000
From: Tejun Heo <tj@...nel.org>
To: Sandeep Dhavale <dhavale@...gle.com>
Cc: jiangshanlai@...il.com, torvalds@...ux-foundation.org,
peterz@...radead.org, linux-kernel@...r.kernel.org,
kernel-team@...a.com, joshdon@...gle.com, brho@...gle.com,
briannorris@...omium.org, nhuck@...gle.com, agk@...hat.com,
snitzer@...nel.org, void@...ifault.com, kernel-team@...roid.com,
Swapnil Sapkal <swapnil.sapkal@....com>, kprateek.nayak@....com
Subject: Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods
Hello,
On Wed, Jun 14, 2023 at 11:49:53AM -0700, Sandeep Dhavale wrote:
> Thank you for your patches! I tested the affinity-scopes-v2 with app launch
> benchmarks. The numbers below are total scheduling latency for erofs kworkers
> and last column is with percpu highpri kthreads that is
> CONFIG_EROFS_FS_PCPU_KTHREAD=y
> CONFIG_EROFS_FS_PCPU_KTHREAD_HIPRI=y
>
> Scheduling latency is the latency between when the task became eligible to run
> to when it actually started running. The test does 50 cold app launches for each
> and aggregates the numbers.
>
> | Table | Upstream | Cache nostrict | CPU nostrict | PCPU hpri |
> |--------------+----------+----------------+--------------+-----------|
> | Average (us) | 12286 | 7440 | 4435 | 2717 |
> | Median (us) | 12528 | 3901 | 3258 | 2476 |
> | Minimum (us) | 287 | 555 | 638 | 357 |
> | Maximum (us) | 35600 | 35911 | 13364 | 6874 |
> | Stdev (us) | 7918 | 7503 | 3323 | 1918 |
> |--------------+----------+----------------+--------------+-----------|
>
> We see here that with affinity-scopes-v2 (which defaults to cache nostrict),
> there is a good improvement when compared to the current codebase.
> Affinity scope "CPU nostrict" for erofs workqueue has even better numbers
> for my test launches and it resembles logically to percpu highpri kthreads
> approach. Percpu highpri kthreads has the lowest latency and variation,
> probably down to running at higher priority as those threads are set to
> sched_set_fifo_low().
If you set workqueue to CPU strict and set its nice value to -19 in the
sysfs interface, it should behave simliar to the hardcoded PCPU hpri. I'd
also love to see the comparison between strict and nostrict too if possible.
> At high level, the app launch numbers itself improved with your series as
> entire workqueue subsystem improved across the board.
Glad to hear.
Thanks.
--
tejun
Powered by blists - more mailing lists