lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 Apr 2023 11:05:45 +0100
From:   Steven Price <steven.price@....com>
To:     Lucas Stach <l.stach@...gutronix.de>,
        Danilo Krummrich <dakr@...hat.com>, luben.tuikov@....com,
        airlied@...il.com, daniel@...ll.ch, christian.koenig@....com
Cc:     linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH v2] drm/scheduler: set entity to NULL in
 drm_sched_entity_pop_job()

On 19/04/2023 10:53, Steven Price wrote:
> On 19/04/2023 10:44, Lucas Stach wrote:
>> Hi Steven,
>>
>> Am Mittwoch, dem 19.04.2023 um 10:39 +0100 schrieb Steven Price:
>>> On 18/04/2023 11:04, Danilo Krummrich wrote:
>>>> It already happend a few times that patches slipped through which
>>>> implemented access to an entity through a job that was already removed
>>>> from the entities queue. Since jobs and entities might have different
>>>> lifecycles, this can potentially cause UAF bugs.
>>>>
>>>> In order to make it obvious that a jobs entity pointer shouldn't be
>>>> accessed after drm_sched_entity_pop_job() was called successfully, set
>>>> the jobs entity pointer to NULL once the job is removed from the entity
>>>> queue.
>>>>
>>>> Moreover, debugging a potential NULL pointer dereference is way easier
>>>> than potentially corrupted memory through a UAF.
>>>>
>>>> Signed-off-by: Danilo Krummrich <dakr@...hat.com>
>>>
>>> This triggers a splat for me (with Panfrost driver), the cause of which
>>> is the following code in drm_sched_get_cleanup_job():
>>>
>>> 	if (job) {
>>> 		job->entity->elapsed_ns += ktime_to_ns(
>>> 			ktime_sub(job->s_fence->finished.timestamp,
>>> 				  job->s_fence->scheduled.timestamp));
>>> 	}
>>>
>>> which indeed is accessing entity after the job has been returned from
>>> drm_sched_entity_pop_job(). And obviously job->entity is a NULL pointer
>>> with this patch.
>>>
>>> I'm afraid I don't fully understand the lifecycle so I'm not sure if
>>> this is simply exposing an existing bug in drm_sched_get_cleanup_job()
>>> or if this commit needs to be reverted.
>>>
>> Not sure which tree you are on. The offending commit has been reverted
>> in 6.3-rc5.
> 
> This is in drm-misc-next - I'm not sure which "offending commit" you are
> referring to. I'm referring to:
> 
> 96c7c2f4d5bd ("drm/scheduler: set entity to NULL in
> drm_sched_entity_pop_job()")
> 
> which was merged yesterday to drm-misc-next (and is currently the top
> commit).
> 
> Is there another commit which has been reverted elsewhere which is
> conflicting?

Answering my own question, the conflicting commit is:

baad10973fdb ("Revert "drm/scheduler: track GPU active time per entity"")

But that commit isn't (yet) in drm-misc-next. Which unfortunately means
drm-misc-next is broken until a back-merge happens.

Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ