linux-kernel - Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent job submission

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <07c8c588-c2d6-463d-aabc-d472b8f38be8@ursulin.net>
Date: Thu, 5 Feb 2026 09:53:05 +0000
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Marco Pagani <marco.pagani@...ux.dev>
Cc: airlied@...il.com, ckoenig.leichtzumerken@...il.com, dakr@...nel.org,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
 maarten.lankhorst@...ux.intel.com, marpagan@...hat.com,
 matthew.brost@...el.com, mripard@...nel.org, phasta@...nel.org,
 simona@...ll.ch, tzimmermann@...e.de
Subject: Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent
 job submission


On 04/02/2026 16:33, Marco Pagani wrote:

8><

>>>>> +	{
>>>>> +		.description = "Concurrently submit multiple jobs in a single entity",
>>>>> +		.job_base_us = 1000,
>>>>> +		.num_jobs = 10,
>>>>> +		.num_subs = 64,
>>>>> +	},
>>>>> +};
>>>>> +
>>>>> +static void
>>>>> +drm_sched_concurrent_desc(const struct drm_sched_concurrent_params *params, char *desc)
>>>>> +{
>>>>> +	strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
>>>>> +}
>>>>> +
>>>>> +KUNIT_ARRAY_PARAM(drm_sched_concurrent, drm_sched_concurrent_cases, drm_sched_concurrent_desc);
>>>>> +
>>>>> +struct submitter_data {
>>>>> +	struct work_struct work;
>>>>> +	struct sched_concurrent_test_context *ctx;
>>>>> +	struct drm_mock_sched_entity *entity;
>>>>> +	struct drm_mock_sched_job **jobs;
>>>>> +	struct kunit *test;
>>>>> +	unsigned int id;
>>>>> +	bool timedout;
>>>>> +};
>>>>> +
>>>>> +static void drm_sched_submitter_worker(struct work_struct *work)
>>>>> +{
>>>>> +	const struct drm_sched_concurrent_params *params;
>>>>> +	struct sched_concurrent_test_context *ctx;
>>>>> +	struct submitter_data *sub_data;
>>>>> +	unsigned int i, duration_us;
>>>>> +	unsigned long timeout_jiffies;
>>>>> +	bool done;
>>>>> +
>>>>> +	sub_data = container_of(work, struct submitter_data, work);
>>>>> +	ctx = sub_data->ctx;
>>>>> +	params = sub_data->test->param_value;
>>>>> +
>>>>> +	wait_for_completion(&ctx->wait_go);
>>>>> +
>>>>> +	for (i = 0; i < params->num_jobs; i++) {
>>>>> +		duration_us = params->job_base_us + (sub_data->id * 10);
>>>>
>>>> Why is job duration dependent by the submitter id?
>>>
>>> Just a simple way to have a deterministic distribution of durations.
>>> I can change it if it doesn't fit.
>>>
>>>> Would it be feasiable to not use auto-completing jobs and instead
>>>> advance the timeline manually? Given how the premise of the test seems
>>>> to be about concurrent submission it sounds plausible that what happens
>>>> after submission maybe isn't very relevant.
>>>
>>> Good idea! I'll run some experiments and see if it works.
>>
>> Cool, I will await your findings in v2. :)
> 
> 
> After fiddling a bit with the code, I came to the conclusion that
> changing the design to use manual timeline advancement is doable, but
> not beneficial, since it would require serializing job submission into
> "batches" using a two-step process, i.e., (i) workers submit jobs, and
> (ii) the main thread waits for all submissions, advances the timeline,
> and then releases the workers for the next iteration.

What do you mean by next iteration?

In the patch you have each worker submit all jobs in one go.

> This approach would partially defeat the purpose of a concurrency test
> as it would not allow job submission (KUnit process context) to overlap
> with job completion (hrtimer callback interrupt context) that models
> asynchronous hardware in the mock scheduler, limiting contention on the
> DRM scheduler internal locking. Moreover, while manual advancement might
> appear to make the test deterministic, it does not since the order in
> which the workers are scheduled will still be non-deterministic.

Ah, so it depends what is the test actually wanting to test. In my view 
exercising parallel submit is one thing, while interleaving submission 
with completion is something else.

In the test as written I think there is very little of the latter. Each 
worker submits all their jobs in one tight loop. Jobs you set to be 1ms 
so first job completion is 1ms away from when workers are released. A 
lot of the jobs can be submitted in 1ms and it would be interesting to 
see exactly how much, from how many workers.

If desire is to interleave completion and submission I think that should 
be made more explicit (less depending on how fast is the underlying 
machine). For example adding delays into the submit loop to actually 
guarantee that.

But I would also recommend parallel submit and parallel submit vs 
completions are tested in separate test cases. It should be easy to do 
with some flags and almost no new code. I was suggesting flags for some 
other thing in the initial review as well. Right, for auto-complete. So 
flag could be something like:

+static const struct drm_sched_concurrent_params 
drm_sched_concurrent_cases[] = {
+	{
+		.description = "Concurrently submit a single job in a single entity",
+		.job_base_us = 1000,
+		.num_jobs = 1,
+		.num_subs = 32,
		.flags = INTERLEAVE_SUBMIT_AND_COMPLETE,
+	},

In the submit loop:

+	for (i = 0; i < params->num_jobs; i++) {
+		duration_us = params->job_base_us + (sub_data->id * 10);
		if (flags & INTERLEAVE_SUBMIT_AND_COMPLETE) {
			drm_mock_sched_job_set_duration_us(sub_data->jobs[i], duration_us);
			// Add a delay based on time elapse and job duration to guarantee job 
completions start arriving
		}
+		drm_mock_sched_job_submit(sub_data->jobs[i]);
+	}


And of course handle the job waiting stage appropriately depending on 
auto-complete or not.

Anyway, testing as little as possible at a time is a best practice I 
recommend, but if you insist, there is also nothing fundamentally wrong 
with the test as you have it so I won't block it.

Regards,

Tvrtko