linux-kernel - Re: [RFC PATCH 0/3] workqueue: Add configure to reduce work latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251206043357.478179-1-jackzxcui1989@163.com>
Date: Sat,  6 Dec 2025 12:33:57 +0800
From: Xin Zhao <jackzxcui1989@....com>
To: tj@...nel.org
Cc: hch@...radead.org,
	jackzxcui1989@....com,
	jiangshanlai@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/3] workqueue: Add configure to reduce work latency

On Fri, 5 Dec 2025 07:47:40 -1000 Tejun Heo <tj@...nel.org> wrote:
> On Fri, Dec 05, 2025 at 08:54:42PM +0800, Xin Zhao wrote:
> > In a system with high real-time requirements, we have noticed that many
> > high-priority tasks, such as kernel threads responsible for dispatching
> > GPU tasks and receiving data sources, often experience latency spikes
> > due to insufficient real-time execution of work.
> > The existing sysfs can adjust nice value for unbound workqueues. Add new
> > 'policy' node to support three common policies: SCHED_NORMAL, SCHED_FIFO,
> > or SCHED_RR. The original 'nice' node is retained for compatibility, add
> > new 'rtprio' node to adjust real-time priority when 'policy' is SCHED_FIFO
> > or SCHED_RR. The value of 'rtprio' uses the same numerical meaning as user
> > space tool chrt.
> > Introduce variable 'nr_idle_extra', which allows user space to configure
> > unbound workqueue through sysfs according to the real-time requirement.
> > By default, workqueue created by system will set 'nr_idle_extra' to 0.
> > When the policy of workqueue is set to SCHED_FIFO or SCHED_RR via sysfs,
> > 'nr_idle_extra' will be set to WORKER_NR_RT_DEF(2) as default.
> > Supporting the private configuration aims to deterministically ensure that
> > tasks within one workqueue are not affected by tasks from other workqueues
> > with the same attributes. If the user has high real-time requirements,
> > they can increase the nr_idle_extra supported in the previous patch while
> > also setting the workqueue 'private', allowing it to independently use
> > kworker threads, thus ensuring scheduling-related work delays never occur.
> 
> I don't think I'm applying this:
> 
> - The rationale is too vague. What are you exactly running and observing?
>   How does this improve the situation?
> 
> - If wq supports private pools, then I don't think it makes sense to add wq
>   interface to change their attributes. Once turned private, the worker
>   threads are fixed and userspace can set whatever attributes they want to
>   set, no?


Our system is used to run intelligent drive, which have explicit and stringent
requirements for real-time performance. This is why I developed this set of
patches:
1. Data acquisition quality inspection relies on the deterministic processing
of UART IMU data, which cannot exceed a specified latency range. Otherwise,
some topic data will be observed to have higher latency. As you know, I have
already proposed a patch for the TTY flip buffer to improve this situation.
Recent tests show that even with the workqueue attributes set to a nice value
of -20, after long-term operation, 2% of the entries in the quality inspection
total still show anomalies.
2. The GPU model processing has a requirement of at least 20 frames per second.
The time allocated for dispatching and running GPU tasks is within 10ms.
Excluding the execution time of GPU itself, the remaining time for scheduling
the dispatch of work tasks to their actual execution is only about 1ms.
Although there aren't many high real-time tasks on the system, there are still
some. Using common CFS kworker, perfetto trace captured show that due to
untimely scheduling of kworker/u37, there are often kernel submit costs
exceeding 20ms from "dispatching work to the actual execution of work."
3. The workqueue API is the most commonly used programming interface for task
processing in kernel drivers. The GPU driver and TTY driver where we encounter
issues also use it. Using kthread_work instead would involve adjustments to
the driver code's logic and retesting, while adding functionality to the
existing workqueue API only requires testing the workqueue itself.
Additionally, the workqueue API has complex and superior logic, its pooling
management logic saves system resources. I believe that providing real-time
capabilities based on the current workqueue API is a better choice.


Assuming we need to provide real-time capabilities based on the workqueue API,
let's discuss how to implement this:
1. Regarding your point that the wq interface is no longer needed once wq
supports private pools, I would like to say that creating kworker threads for
worker pools is dynamic in terms of creation and release. Concurrently
enqueuing multiple works in a workqueue may lead to new thread creation. After
a user sets the scheduling attributes of kworker threads which belong to a
private wq, any newly created kworker threads will not automatically inherit
the scheduling attributes set by the user, as their parent process is kthreadd.
2. In the commit log of the nr_idle_extra patch, I described two common types
of latency:
    Type 1: The need_more_worker check whether pool->nr_running is zero; if it
    is not zero, it will not wake up idle kworker threads to execute
    immediately, resulting in work execution latency.
    Type 2: The need_more_worker has already checked that pool->nr_running is
    zero, but currently, there are no idle kworker threads, leading to work
    execution latency.
The addition of the nr_idle_extra feature is intended to allow users to
optionally reduce execution latency based on real-time requirements.


As for the testing results of this set of patches, I have enabled them this
week and do the performance and stability tests. I will share the results once
testing is complete.


--
Xin Zhao