netdev - Re: [PATCH net-next v1 2/2] net/rds: Give each connection its own workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4133dc32-d639-40a9-b49a-d893caae1821@redhat.com>
Date: Thu, 6 Nov 2025 11:52:03 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Allison Henderson <allison.henderson@...cle.com>,
 "achender@...nel.org" <achender@...nel.org>,
 "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next v1 2/2] net/rds: Give each connection its own
 workqueue

On 11/4/25 10:23 PM, Allison Henderson wrote:
> On Tue, 2025-11-04 at 15:57 +0100, Paolo Abeni wrote:
>> On 10/29/25 6:46 PM, Allison Henderson wrote:
>>> From: Allison Henderson <allison.henderson@...cle.com>
>>>
>>> RDS was written to require ordered workqueues for "cp->cp_wq":
>>> Work is executed in the order scheduled, one item at a time.
>>>
>>> If these workqueues are shared across connections,
>>> then work executed on behalf of one connection blocks work
>>> scheduled for a different and unrelated connection.
>>>
>>> Luckily we don't need to share these workqueues.
>>> While it obviously makes sense to limit the number of
>>> workers (processes) that ought to be allocated on a system,
>>> a workqueue that doesn't have a rescue worker attached,
>>> has a tiny footprint compared to the connection as a whole:
>>> A workqueue costs ~800 bytes, while an RDS/IB connection
>>> totals ~5 MBytes.
>>
>> Still a workqueue per connection feels overkill. Have you considered
>> moving to WQ_PERCPU for rds_wq? Why does not fit?
>>
>> Thanks,
>>
>> Paolo
>>
> Hi Paolo
> 
> I hadnt thought of WQ_PERCPU before, so I did some digging on it.  In our case though, we need FIFO behavior per-
> connection, so if we switched to queues per cpu, we'd have to pin a CPU to a connection to get the right behavior.  And
> then that brings back head of the line blocking since now all the items on that queue have to share that CPU even if the
> other CPUs are idle.  So it wouldn't quite be a synonymous solution for what we're trying to do in this case.  I hope
> that made sense?  Let me know what you think.

Still the workqueue per connection gives significant more overhead than
your estimate above. I guess ~800 bytes is sizeof(struct workqueue_struct)?

Please note that such struct contains several dynamically allocated
pointers, among them per_cpu ones: the overall amount of memory used is
significantly greater than your estimate. You should provide a more
accurate one.

Much more importantly, using a workqueue per connection provides
scalibility gain only in the measure that each workqueue uses a
different pool and thus creates additional kthread(s). I'm haven't dived
into the workqueue implementation but I think this is not the case. My
current guestimate is that you measure some gain because the per
connection WK actually creates (or just use) a single pool different
from rds_wq's one.

Please double check the above.

Out of sheer ignorance I suspect/hope that replacing the current
workqueue with alloc_ordered_workqueue() (possibly UNBOUND?!?) will give
the same scalability improvement with no cost.

/P