lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8iq6gLtmX1c8VSf@slm.duckdns.org>
Date:   Wed, 18 Jan 2023 16:28:58 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Nathan Huckleberry <nhuck@...gle.com>
Cc:     Sandeep Dhavale <dhavale@...gle.com>,
        Daeho Jeong <daehojeong@...gle.com>,
        Eric Biggers <ebiggers@...nel.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] workqueue: Add WQ_SCHED_FIFO

Hello,

On Wed, Jan 18, 2023 at 06:01:04PM -0800, Nathan Huckleberry wrote:
> Do you think something similar should be done for WQ_UNBOUND? In most
> places where WQ_HIGHPRI is used, WQ_UNBOUND is also used because it
> boosts performance. However, I suspect that most of these benchmarks
> were done on x86-64. I've found that WQ_UNBOUND significantly reduces
> performance on arm64/Android.

One attribute with per-cpu workqueues is that they're concurrency-level
limited. ie. if you have two per-cpu work items queued, the second one might
not run until the first one is done. Maybe people were trying to avoid
possible latency spikes from that?

Even aside from that, UNBOUND tends to give more consistent latency
behaviors as you aren't necessarily bound to what's happening on that
particular, so I guess maybe that's also why but I didn't really follow how
each user is picking and justifying these flags, so my insight is pretty
limited.

> From the documentation, using WQ_UNBOUND for performance doesn't seem
> correct. It's only supposed to be used for long-running work. It might
> make more sense to get rid of WQ_UNBOUND altogether and only move work
> to unbound worker pools once it has stuck around for long enough.

UNBOUND says: Don't pin this to one cpu or subject it to workqueue's
concurrency limit. Use workqueue as a generic thread pool.

I don't know what you mean by performance but HIGHPRI | UNBOUND will
definitely improve some aspects.

> Android will probably need to remove WQ_UNBOUND from all of these
> performance critical users.
> 
> If there are performance benefits to using unbinding workqueues from
> CPUs on x86-64, that should probably be a config flag, not controlled
> by every user.

It's unlikely that the instruction set is what's making the difference here,
right? It probably would help if we understand why it's worse on arm.

I don't think ppl have been all that deliberate with these flags, so it's
also likely that some of the usages can drop UNBOUND completely but I really
think more data and analyses would help.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ