linux-kernel - Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZJNaOumMiGSqK2-2@slm.duckdns.org>
Date:   Wed, 21 Jun 2023 10:14:50 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Sandeep Dhavale <dhavale@...gle.com>
Cc:     jiangshanlai@...il.com, torvalds@...ux-foundation.org,
        peterz@...radead.org, linux-kernel@...r.kernel.org,
        kernel-team@...a.com, joshdon@...gle.com, brho@...gle.com,
        briannorris@...omium.org, nhuck@...gle.com, agk@...hat.com,
        snitzer@...nel.org, void@...ifault.com, kernel-team@...roid.com,
        Swapnil Sapkal <swapnil.sapkal@....com>, kprateek.nayak@....com
Subject: Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

Hello,

On Wed, Jun 14, 2023 at 11:49:53AM -0700, Sandeep Dhavale wrote:
> Thank you for your patches! I tested the affinity-scopes-v2 with app launch
> benchmarks. The numbers below are total scheduling latency for erofs kworkers
> and last column is with percpu highpri kthreads that is
> CONFIG_EROFS_FS_PCPU_KTHREAD=y
> CONFIG_EROFS_FS_PCPU_KTHREAD_HIPRI=y
> 
> Scheduling latency is the latency between when the task became eligible to run
> to when it actually started running. The test does 50 cold app launches for each
> and aggregates the numbers.
> 
> | Table        | Upstream | Cache nostrict | CPU nostrict | PCPU hpri |
> |--------------+----------+----------------+--------------+-----------|
> | Average (us) | 12286    | 7440           | 4435         | 2717      |
> | Median (us)  | 12528    | 3901           | 3258         | 2476      |
> | Minimum (us) | 287      | 555            | 638          | 357       |
> | Maximum (us) | 35600    | 35911          | 13364        | 6874      |
> | Stdev (us)   | 7918     | 7503           | 3323         | 1918      |
> |--------------+----------+----------------+--------------+-----------|
> 
> We see here that with affinity-scopes-v2 (which defaults to cache nostrict),
> there is a good improvement when compared to the current codebase.
> Affinity scope "CPU nostrict" for erofs workqueue has even better numbers
> for my test launches and it resembles logically to percpu highpri kthreads
> approach. Percpu highpri kthreads has the lowest latency and variation,
> probably down to running at higher priority as those threads are set to
> sched_set_fifo_low().

If you set workqueue to CPU strict and set its nice value to -19 in the
sysfs interface, it should behave simliar to the hardcoded PCPU hpri. I'd
also love to see the comparison between strict and nostrict too if possible.

> At high level, the app launch numbers itself improved with your series as
> entire workqueue subsystem improved across the board.

Glad to hear.

Thanks.

-- 
tejun