[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <07c8c588-c2d6-463d-aabc-d472b8f38be8@ursulin.net>
Date: Thu, 5 Feb 2026 09:53:05 +0000
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Marco Pagani <marco.pagani@...ux.dev>
Cc: airlied@...il.com, ckoenig.leichtzumerken@...il.com, dakr@...nel.org,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
maarten.lankhorst@...ux.intel.com, marpagan@...hat.com,
matthew.brost@...el.com, mripard@...nel.org, phasta@...nel.org,
simona@...ll.ch, tzimmermann@...e.de
Subject: Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent
job submission
On 04/02/2026 16:33, Marco Pagani wrote:
8><
>>>>> + {
>>>>> + .description = "Concurrently submit multiple jobs in a single entity",
>>>>> + .job_base_us = 1000,
>>>>> + .num_jobs = 10,
>>>>> + .num_subs = 64,
>>>>> + },
>>>>> +};
>>>>> +
>>>>> +static void
>>>>> +drm_sched_concurrent_desc(const struct drm_sched_concurrent_params *params, char *desc)
>>>>> +{
>>>>> + strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
>>>>> +}
>>>>> +
>>>>> +KUNIT_ARRAY_PARAM(drm_sched_concurrent, drm_sched_concurrent_cases, drm_sched_concurrent_desc);
>>>>> +
>>>>> +struct submitter_data {
>>>>> + struct work_struct work;
>>>>> + struct sched_concurrent_test_context *ctx;
>>>>> + struct drm_mock_sched_entity *entity;
>>>>> + struct drm_mock_sched_job **jobs;
>>>>> + struct kunit *test;
>>>>> + unsigned int id;
>>>>> + bool timedout;
>>>>> +};
>>>>> +
>>>>> +static void drm_sched_submitter_worker(struct work_struct *work)
>>>>> +{
>>>>> + const struct drm_sched_concurrent_params *params;
>>>>> + struct sched_concurrent_test_context *ctx;
>>>>> + struct submitter_data *sub_data;
>>>>> + unsigned int i, duration_us;
>>>>> + unsigned long timeout_jiffies;
>>>>> + bool done;
>>>>> +
>>>>> + sub_data = container_of(work, struct submitter_data, work);
>>>>> + ctx = sub_data->ctx;
>>>>> + params = sub_data->test->param_value;
>>>>> +
>>>>> + wait_for_completion(&ctx->wait_go);
>>>>> +
>>>>> + for (i = 0; i < params->num_jobs; i++) {
>>>>> + duration_us = params->job_base_us + (sub_data->id * 10);
>>>>
>>>> Why is job duration dependent by the submitter id?
>>>
>>> Just a simple way to have a deterministic distribution of durations.
>>> I can change it if it doesn't fit.
>>>
>>>> Would it be feasiable to not use auto-completing jobs and instead
>>>> advance the timeline manually? Given how the premise of the test seems
>>>> to be about concurrent submission it sounds plausible that what happens
>>>> after submission maybe isn't very relevant.
>>>
>>> Good idea! I'll run some experiments and see if it works.
>>
>> Cool, I will await your findings in v2. :)
>
>
> After fiddling a bit with the code, I came to the conclusion that
> changing the design to use manual timeline advancement is doable, but
> not beneficial, since it would require serializing job submission into
> "batches" using a two-step process, i.e., (i) workers submit jobs, and
> (ii) the main thread waits for all submissions, advances the timeline,
> and then releases the workers for the next iteration.
What do you mean by next iteration?
In the patch you have each worker submit all jobs in one go.
> This approach would partially defeat the purpose of a concurrency test
> as it would not allow job submission (KUnit process context) to overlap
> with job completion (hrtimer callback interrupt context) that models
> asynchronous hardware in the mock scheduler, limiting contention on the
> DRM scheduler internal locking. Moreover, while manual advancement might
> appear to make the test deterministic, it does not since the order in
> which the workers are scheduled will still be non-deterministic.
Ah, so it depends what is the test actually wanting to test. In my view
exercising parallel submit is one thing, while interleaving submission
with completion is something else.
In the test as written I think there is very little of the latter. Each
worker submits all their jobs in one tight loop. Jobs you set to be 1ms
so first job completion is 1ms away from when workers are released. A
lot of the jobs can be submitted in 1ms and it would be interesting to
see exactly how much, from how many workers.
If desire is to interleave completion and submission I think that should
be made more explicit (less depending on how fast is the underlying
machine). For example adding delays into the submit loop to actually
guarantee that.
But I would also recommend parallel submit and parallel submit vs
completions are tested in separate test cases. It should be easy to do
with some flags and almost no new code. I was suggesting flags for some
other thing in the initial review as well. Right, for auto-complete. So
flag could be something like:
+static const struct drm_sched_concurrent_params
drm_sched_concurrent_cases[] = {
+ {
+ .description = "Concurrently submit a single job in a single entity",
+ .job_base_us = 1000,
+ .num_jobs = 1,
+ .num_subs = 32,
.flags = INTERLEAVE_SUBMIT_AND_COMPLETE,
+ },
In the submit loop:
+ for (i = 0; i < params->num_jobs; i++) {
+ duration_us = params->job_base_us + (sub_data->id * 10);
if (flags & INTERLEAVE_SUBMIT_AND_COMPLETE) {
drm_mock_sched_job_set_duration_us(sub_data->jobs[i], duration_us);
// Add a delay based on time elapse and job duration to guarantee job
completions start arriving
}
+ drm_mock_sched_job_submit(sub_data->jobs[i]);
+ }
And of course handle the job waiting stage appropriately depending on
auto-complete or not.
Anyway, testing as little as possible at a time is a best practice I
recommend, but if you insist, there is also nothing fundamentally wrong
with the test as you have it so I won't block it.
Regards,
Tvrtko
Powered by blists - more mailing lists