linux-kernel - Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent job submission

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d5982a20-b43a-4548-b65a-2ffc5ae8bcc4@ursulin.net>
Date: Fri, 6 Feb 2026 12:12:10 +0000
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Marco Pagani <marco.pagani@...ux.dev>
Cc: airlied@...il.com, ckoenig.leichtzumerken@...il.com, dakr@...nel.org,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
 maarten.lankhorst@...ux.intel.com, marpagan@...hat.com,
 matthew.brost@...el.com, mripard@...nel.org, phasta@...nel.org,
 simona@...ll.ch, tzimmermann@...e.de
Subject: Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent
 job submission


On 06/02/2026 10:36, Marco Pagani wrote:
> 
> 
> On 05/02/2026 10:53, Tvrtko Ursulin wrote:
>>
>> On 04/02/2026 16:33, Marco Pagani wrote:
>>
>> 8><
>>
>>>>>>> +	{
>>>>>>> +		.description = "Concurrently submit multiple jobs in a single entity",
>>>>>>> +		.job_base_us = 1000,
>>>>>>> +		.num_jobs = 10,
>>>>>>> +		.num_subs = 64,
>>>>>>> +	},
>>>>>>> +};
>>>>>>> +
>>>>>>> +static void
>>>>>>> +drm_sched_concurrent_desc(const struct drm_sched_concurrent_params *params, char *desc)
>>>>>>> +{
>>>>>>> +	strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
>>>>>>> +}
>>>>>>> +
>>>>>>> +KUNIT_ARRAY_PARAM(drm_sched_concurrent, drm_sched_concurrent_cases, drm_sched_concurrent_desc);
>>>>>>> +
>>>>>>> +struct submitter_data {
>>>>>>> +	struct work_struct work;
>>>>>>> +	struct sched_concurrent_test_context *ctx;
>>>>>>> +	struct drm_mock_sched_entity *entity;
>>>>>>> +	struct drm_mock_sched_job **jobs;
>>>>>>> +	struct kunit *test;
>>>>>>> +	unsigned int id;
>>>>>>> +	bool timedout;
>>>>>>> +};
>>>>>>> +
>>>>>>> +static void drm_sched_submitter_worker(struct work_struct *work)
>>>>>>> +{
>>>>>>> +	const struct drm_sched_concurrent_params *params;
>>>>>>> +	struct sched_concurrent_test_context *ctx;
>>>>>>> +	struct submitter_data *sub_data;
>>>>>>> +	unsigned int i, duration_us;
>>>>>>> +	unsigned long timeout_jiffies;
>>>>>>> +	bool done;
>>>>>>> +
>>>>>>> +	sub_data = container_of(work, struct submitter_data, work);
>>>>>>> +	ctx = sub_data->ctx;
>>>>>>> +	params = sub_data->test->param_value;
>>>>>>> +
>>>>>>> +	wait_for_completion(&ctx->wait_go);
>>>>>>> +
>>>>>>> +	for (i = 0; i < params->num_jobs; i++) {
>>>>>>> +		duration_us = params->job_base_us + (sub_data->id * 10);
>>>>>>
>>>>>> Why is job duration dependent by the submitter id?
>>>>>
>>>>> Just a simple way to have a deterministic distribution of durations.
>>>>> I can change it if it doesn't fit.
>>>>>
>>>>>> Would it be feasiable to not use auto-completing jobs and instead
>>>>>> advance the timeline manually? Given how the premise of the test seems
>>>>>> to be about concurrent submission it sounds plausible that what happens
>>>>>> after submission maybe isn't very relevant.
>>>>>
>>>>> Good idea! I'll run some experiments and see if it works.
>>>>
>>>> Cool, I will await your findings in v2. :)
>>>
>>>
>>> After fiddling a bit with the code, I came to the conclusion that
>>> changing the design to use manual timeline advancement is doable, but
>>> not beneficial, since it would require serializing job submission into
>>> "batches" using a two-step process, i.e., (i) workers submit jobs, and
>>> (ii) the main thread waits for all submissions, advances the timeline,
>>> and then releases the workers for the next iteration.
>>
>> What do you mean by next iteration?
>>
>> In the patch you have each worker submit all jobs in one go.
> 
> I mean, if I change the code to use manual timeline advancement, then I
> must introduce some synchronization logic that makes the main thread
> advance the timeline only after the workers have submitted their jobs.

Oh that, I was thinking advancing after flushing the workqueue would be 
enough for this use case. Since that one does not care about when 
completions happens they can just be cleaned up at the exit.

> Since workers submit multiple jobs, I was thinking it would be better to
> have the workers submit jobs in batches instead of all in one go.

No strong opinion from me, as long as it is clear what is being tested.

>>> This approach would partially defeat the purpose of a concurrency test
>>> as it would not allow job submission (KUnit process context) to overlap
>>> with job completion (hrtimer callback interrupt context) that models
>>> asynchronous hardware in the mock scheduler, limiting contention on the
>>> DRM scheduler internal locking. Moreover, while manual advancement might
>>> appear to make the test deterministic, it does not since the order in
>>> which the workers are scheduled will still be non-deterministic.
>>
>> Ah, so it depends what is the test actually wanting to test. In my view
>> exercising parallel submit is one thing, while interleaving submission
>> with completion is something else.
>>
>> In the test as written I think there is very little of the latter. Each
>> worker submits all their jobs in one tight loop. Jobs you set to be 1ms
>> so first job completion is 1ms away from when workers are released. A
>> lot of the jobs can be submitted in 1ms and it would be interesting to
>> see exactly how much, from how many workers.
>>
>> If desire is to interleave completion and submission I think that should
>> be made more explicit (less depending on how fast is the underlying
>> machine). For example adding delays into the submit loop to actually
>> guarantee that.
> 
> Fair point.
> 
>> But I would also recommend parallel submit and parallel submit vs
>> completions are tested in separate test cases. It should be easy to do
>> with some flags and almost no new code. I was suggesting flags for some
>> other thing in the initial review as well. Right, for auto-complete. So
>> flag could be something like:
>>
>> +static const struct drm_sched_concurrent_params
>> drm_sched_concurrent_cases[] = {
>> +	{
>> +		.description = "Concurrently submit a single job in a single entity",
>> +		.job_base_us = 1000,
>> +		.num_jobs = 1,
>> +		.num_subs = 32,
>> 		.flags = INTERLEAVE_SUBMIT_AND_COMPLETE,
>> +	},
>>
>> In the submit loop:
>>
>> +	for (i = 0; i < params->num_jobs; i++) {
>> +		duration_us = params->job_base_us + (sub_data->id * 10);
>> 		if (flags & INTERLEAVE_SUBMIT_AND_COMPLETE) {
>> 			drm_mock_sched_job_set_duration_us(sub_data->jobs[i], duration_us);
>> 			// Add a delay based on time elapse and job duration to guarantee job
>> completions start arriving
>> 		}
>> +		drm_mock_sched_job_submit(sub_data->jobs[i]);
>> +	}
>>
>>
>> And of course handle the job waiting stage appropriately depending on
>> auto-complete or not.
>>
>> Anyway, testing as little as possible at a time is a best practice I
>> recommend, but if you insist, there is also nothing fundamentally wrong
>> with the test as you have it so I won't block it.
> 
> Agreed. Unit tests should test one functionality at a time and be clear
> about which one. I'll follow your suggestions and have two separate test
> cases: a basic one for concurrent submissions with manual timeline
> advancement (no batches, workers submit all jobs at once) and then a
> second one with automatic timeline advancement for testing interleaving
> submissions with completions.
> 
> At this point though, I think it would be better to move the tests to a
> separate source file, as the partial similarity of the first concurrent
> test case with drm_sched_basic_submit could create some confusion.

Works for me, thank you!

Regards,

Tvrtko