[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230622112241.70d4e941@eldfell>
Date: Thu, 22 Jun 2023 11:22:41 +0300
From: Pekka Paalanen <ppaalanen@...il.com>
To: André Almeida <andrealmeid@...lia.com>
Cc: dri-devel@...ts.freedesktop.org, amd-gfx@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, kernel-dev@...lia.com,
alexander.deucher@....com, christian.koenig@....com,
pierre-eric.pelloux-prayer@....com,
Simon Ser <contact@...rsion.fr>,
Rob Clark <robdclark@...il.com>,
Daniel Vetter <daniel@...ll.ch>,
Daniel Stone <daniel@...ishbar.org>,
'Marek Olšák' <maraeo@...il.com>,
Dave Airlie <airlied@...il.com>,
Michel Dänzer <michel.daenzer@...lbox.org>,
Samuel Pitoiset <samuel.pitoiset@...il.com>,
Timur Kristóf <timur.kristof@...il.com>,
Bas Nieuwenhuizen <bas@...nieuwenhuizen.nl>
Subject: Re: [RFC PATCH v3 2/4] drm: Create DRM_IOCTL_GET_RESET
On Wed, 21 Jun 2023 13:33:56 -0300
André Almeida <andrealmeid@...lia.com> wrote:
> Em 21/06/2023 05:09, Pekka Paalanen escreveu:
> > On Tue, 20 Jun 2023 21:57:17 -0300
> > André Almeida <andrealmeid@...lia.com> wrote:
> >
> >> Create a new DRM ioctl operation to get the numbers of resets for a
> >> given context. The numbers reflect just the resets that happened after
> >> the context was created, and not since the machine was booted.
> >>
> >> Create a debugfs interface to make easier to test the API without real
> >> resets.
> >>
> >> Signed-off-by: André Almeida <andrealmeid@...lia.com>
> >> ---
> >> drivers/gpu/drm/drm_debugfs.c | 2 ++
> >> drivers/gpu/drm/drm_ioctl.c | 58 +++++++++++++++++++++++++++++++++++
> >> include/drm/drm_device.h | 3 ++
> >> include/drm/drm_drv.h | 3 ++
> >> include/uapi/drm/drm.h | 21 +++++++++++++
> >> include/uapi/drm/drm_mode.h | 15 +++++++++
> >> 6 files changed, 102 insertions(+)
> >
> > ...
> >
> >> diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> >> index a87bbbbca2d4..a84559aa0d77 100644
> >> --- a/include/uapi/drm/drm.h
> >> +++ b/include/uapi/drm/drm.h
> >> @@ -1169,6 +1169,27 @@ extern "C" {
> >> */
> >> #define DRM_IOCTL_MODE_GETFB2 DRM_IOWR(0xCE, struct drm_mode_fb_cmd2)
> >>
> >> +/**
> >> + * DRM_IOCTL_GET_RESET - Get information about device resets
> >> + *
> >> + * This operation requests from the device information about resets. It should
> >> + * consider only resets that happens after the context is created, therefore,
> >> + * the counter should be zero during context creation.
> >> + *
> >> + * dev_reset_count tells how many resets have happened on this device, and
> >> + * ctx_reset_count tells how many of such resets were caused by this context.
> >> + *
> >> + * Flags can be used to tell if a reset is in progress, and userspace should
> >> + * wait until it's not in progress anymore to be able to create a new context;
> >> + * and to tell if the VRAM is considered lost. There's no safe way to clean this
> >> + * flag so if a context see this flag set, it should be like that until the end
> >> + * of the context.
> >
> > Is "this flag" the VRAM_LOST? Or any flag?
> >
> > Does this mean that not all resets are fatal to the context? Is there
> > any kind of reset that should not be fatal to a context? All the
> > rendering APIs seem to assume that any reset is fatal and the context
> > must be destroyed.
>
> I got this flag from the `AMDGPU_CTX_OP_QUERY_STATE2` operation, and
> it's used to notify that the reset was fatal for a giving context,
> although the idea of non-fatal resets seems to be a bit controversial
> for now, so I think it will be better if I leave this flag for latter
> improvements of the API.
Which flag is "this flag"? There are RESET_IN_PROGRESS and VRAM_LOST.
Both are fine by me to exist.
I think I made a wrong conclusion here. Somehow I read that it would be
possible to have a reset happen, and if VRAM is not lost, then the
context could work again.
Should there be some wording added to say the context is permanently
broken on any kind of reset? Or is that for UMD to decide?
Thanks,
pq
> >
> >> + */
> >> +#define DRM_IOCTL_GET_RESET DRM_IOWR(0xCF, struct drm_get_reset)
> >> +
> >> +#define DRM_RESET_IN_PROGRESS 0x1
> >> +#define DRM_RESET_VRAM_LOST 0x2
> >
> > Ok, so the dmabuf lost is being communicated here, but how would a
> > userspace process know on which device a dmabuf resides on?
> >
> > Let's assume process A uses device 1 to draw, exports a dmabuf, sends
> > it to process B which imports it to device 2. Device 1 resets and loses
> > VRAM contents. How would process B notice that the dmabuf is lost when
> > it never touches device 1 itself?
> >
> >> +
> >> /*
> >> * Device specific ioctls should only be in their respective headers
> >> * The device specific ioctl range is from 0x40 to 0x9f.
> >> diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
> >> index 43691058d28f..c3257bd1af9c 100644
> >> --- a/include/uapi/drm/drm_mode.h
> >> +++ b/include/uapi/drm/drm_mode.h
> >> @@ -1308,6 +1308,21 @@ struct drm_mode_rect {
> >> __s32 y2;
> >> };
> >>
> >> +/**
> >> + * struct drm_get_reset - Get information about a DRM device resets
> >> + * @ctx_id: the context id to be queried about resets
> >> + * @flags: flags
> >> + * @dev_reset_count: global counter of resets for a given DRM device
> >> + * @ctx_reset_count: of all the resets counted by this device, how many were
> >> + * caused by this context.
> >> + */
> >> +struct drm_get_reset {
> >> + __u32 ctx_id;
> >> + __u32 flags;
> >> + __u64 dev_reset_count;
> >> + __u64 ctx_reset_count;
> >> +};
> >> +
> >> #if defined(__cplusplus)
> >> }
> >> #endif
> >
> > Thanks,
> > pq
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists