lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com> Date: Fri, 27 Jan 2023 11:25:10 -0800 From: Nathan Huckleberry <nhuck@...gle.com> To: Tejun Heo <tj@...nel.org> Cc: Sandeep Dhavale <dhavale@...gle.com>, Daeho Jeong <daehojeong@...gle.com>, Eric Biggers <ebiggers@...nel.org>, Sami Tolvanen <samitolvanen@...gle.com>, Lai Jiangshan <jiangshanlai@...il.com>, Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH] workqueue: Add WQ_SCHED_FIFO On Wed, Jan 18, 2023 at 6:29 PM Tejun Heo <tj@...nel.org> wrote: > > Hello, > > On Wed, Jan 18, 2023 at 06:01:04PM -0800, Nathan Huckleberry wrote: > > Do you think something similar should be done for WQ_UNBOUND? In most > > places where WQ_HIGHPRI is used, WQ_UNBOUND is also used because it > > boosts performance. However, I suspect that most of these benchmarks > > were done on x86-64. I've found that WQ_UNBOUND significantly reduces > > performance on arm64/Android. > > One attribute with per-cpu workqueues is that they're concurrency-level > limited. ie. if you have two per-cpu work items queued, the second one might > not run until the first one is done. Maybe people were trying to avoid > possible latency spikes from that? > > Even aside from that, UNBOUND tends to give more consistent latency > behaviors as you aren't necessarily bound to what's happening on that > particular, so I guess maybe that's also why but I didn't really follow how > each user is picking and justifying these flags, so my insight is pretty > limited. > > > From the documentation, using WQ_UNBOUND for performance doesn't seem > > correct. It's only supposed to be used for long-running work. It might > > make more sense to get rid of WQ_UNBOUND altogether and only move work > > to unbound worker pools once it has stuck around for long enough. > > UNBOUND says: Don't pin this to one cpu or subject it to workqueue's > concurrency limit. Use workqueue as a generic thread pool. > > I don't know what you mean by performance but HIGHPRI | UNBOUND will > definitely improve some aspects. > > > Android will probably need to remove WQ_UNBOUND from all of these > > performance critical users. > > > > If there are performance benefits to using unbinding workqueues from > > CPUs on x86-64, that should probably be a config flag, not controlled > > by every user. > > It's unlikely that the instruction set is what's making the difference here, > right? It probably would help if we understand why it's worse on arm. I did some more digging. For dm-verity I think this is related to the availability of SHA instructions. If SHA instructions are present, WQ_UNBOUND is suboptimal because the work finishes very quickly. That doesn't explain why EROFS is slower with WQ_UNBOUND though. It might also be related to the heterogeneity of modern arm processors. Locality may be more important for ARM processors than for x86-64. See the table below: | open-prebuilt-camera | UNBOUND | HIGHPRI | HIGHPRI ONLY | SCHED_FIFO ONLY | | erofs wait time (us) | 357805 | 174205 (-51%) | 129861 (-63%) | | verity wait time (us) | 11746 | 119 (-98%) | 0 (-100%) | The bigger issue seems to be WQ_UNBOUND, so I'm abandoning these patches for now. Thanks, Huck > > I don't think ppl have been all that deliberate with these flags, so it's > also likely that some of the usages can drop UNBOUND completely but I really > think more data and analyses would help. > > Thanks. > > -- > tejun
Powered by blists - more mailing lists