[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <02789f9b-ff16-b419-097f-b97b56afad57@igalia.com>
Date: Thu, 29 Jun 2023 10:11:06 -0300
From: André Almeida <andrealmeid@...lia.com>
To: Christian König <ckoenig.leichtzumerken@...il.com>
Cc: pierre-eric.pelloux-prayer@....com,
Randy Dunlap <rdunlap@...radead.org>,
Daniel Vetter <daniel@...ll.ch>,
'Marek Olšák' <maraeo@...il.com>,
Michel Dänzer <michel.daenzer@...lbox.org>,
Simon Ser <contact@...rsion.fr>, linux-kernel@...r.kernel.org,
dri-devel@...ts.freedesktop.org,
Timur Kristóf <timur.kristof@...il.com>,
amd-gfx@...ts.freedesktop.org,
Pekka Paalanen <ppaalanen@...il.com>,
Daniel Stone <daniel@...ishbar.org>,
Rob Clark <robdclark@...il.com>,
Samuel Pitoiset <samuel.pitoiset@...il.com>,
kernel-dev@...lia.com, Bas Nieuwenhuizen <bas@...nieuwenhuizen.nl>,
alexander.deucher@....com,
Pekka Paalanen <pekka.paalanen@...labora.com>,
Dave Airlie <airlied@...il.com>, christian.koenig@....com
Subject: Re: [PATCH v5 1/1] drm/doc: Document DRM device reset expectations
Em 27/06/2023 18:17, André Almeida escreveu:
> Em 27/06/2023 14:47, Christian König escreveu:
>> Am 27.06.23 um 15:23 schrieb André Almeida:
>>> Create a section that specifies how to deal with DRM device resets for
>>> kernel and userspace drivers.
>>>
>>> Acked-by: Pekka Paalanen <pekka.paalanen@...labora.com>
>>> Signed-off-by: André Almeida <andrealmeid@...lia.com>
>>> ---
>>>
>>> v4:
>>> https://lore.kernel.org/lkml/20230626183347.55118-1-andrealmeid@igalia.com/
>>>
>>> Changes:
>>> - Grammar fixes (Randy)
>>>
>>> Documentation/gpu/drm-uapi.rst | 68 ++++++++++++++++++++++++++++++++++
>>> 1 file changed, 68 insertions(+)
>>>
>>> diff --git a/Documentation/gpu/drm-uapi.rst
>>> b/Documentation/gpu/drm-uapi.rst
>>> index 65fb3036a580..3cbffa25ed93 100644
>>> --- a/Documentation/gpu/drm-uapi.rst
>>> +++ b/Documentation/gpu/drm-uapi.rst
>>> @@ -285,6 +285,74 @@ for GPU1 and GPU2 from different vendors, and a
>>> third handler for
>>> mmapped regular files. Threads cause additional pain with signal
>>> handling as well.
>>> +Device reset
>>> +============
>>> +
>>> +The GPU stack is really complex and is prone to errors, from
>>> hardware bugs,
>>> +faulty applications and everything in between the many layers. Some
>>> errors
>>> +require resetting the device in order to make the device usable
>>> again. This
>>> +sections describes the expectations for DRM and usermode drivers when a
>>> +device resets and how to propagate the reset status.
>>> +
>>> +Kernel Mode Driver
>>> +------------------
>>> +
>>> +The KMD is responsible for checking if the device needs a reset, and
>>> to perform
>>> +it as needed. Usually a hang is detected when a job gets stuck
>>> executing. KMD
>>> +should keep track of resets, because userspace can query any time
>>> about the
>>> +reset stats for an specific context.
>>
>> Maybe drop the part "for a specific context". Essentially the reset
>> query could use global counters instead and we won't need the context
>> any more here.
>>
>
> Right, I wrote like this to reflect how it's currently implemented.
>
> If follow correctly what you meant, KMD could always notify the global
> count for UMD, and we would move to the UMD the responsibility to manage
> the reset counters, right? This would also simplify my
> DRM_IOCTL_GET_RESET proposal. I'll apply your suggestion to the next doc
> version.
>
Actually, if we drop the context identifier we would lose the ability to
track which is the guilty context. Vulkan API doesn't seem to care about
this, but OpenGL does.
>> Apart from that this sounds good to me, feel free to add my rb.
>>
>> Regards,
>> Christian.
>>
>>
Powered by blists - more mailing lists