lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACu1E7FduhsXY22BKpjt5WcnAcVtGu01eUiLc9T47OUR+yp_0Q@mail.gmail.com>
Date: Wed, 19 Mar 2025 12:15:36 -0400
From: Connor Abbott <cwabbott0@...il.com>
To: Rob Clark <robdclark@...il.com>
Cc: dri-devel@...ts.freedesktop.org, freedreno@...ts.freedesktop.org, 
	linux-arm-msm@...r.kernel.org, Rob Clark <robdclark@...omium.org>, 
	Abhinav Kumar <quic_abhinavk@...cinc.com>, Dmitry Baryshkov <lumag@...nel.org>, 
	Sean Paul <sean@...rly.run>, Marijn Suijten <marijn.suijten@...ainline.org>, 
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, 
	Konrad Dybcio <konradybcio@...nel.org>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 16/34] drm/msm: Mark VM as unusable on faults

On Wed, Mar 19, 2025 at 10:55 AM Rob Clark <robdclark@...il.com> wrote:
>
> From: Rob Clark <robdclark@...omium.org>
>
> If userspace has opted-in to VM_BIND, then GPU faults and VM_BIND errors
> will mark the VM as unusable.
>
> Signed-off-by: Rob Clark <robdclark@...omium.org>
> ---
>  drivers/gpu/drm/msm/msm_gem.h        | 17 +++++++++++++++++
>  drivers/gpu/drm/msm/msm_gem_submit.c |  3 +++
>  drivers/gpu/drm/msm/msm_gpu.c        | 16 ++++++++++++++--
>  3 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> index acb976722580..7cb720137548 100644
> --- a/drivers/gpu/drm/msm/msm_gem.h
> +++ b/drivers/gpu/drm/msm/msm_gem.h
> @@ -82,6 +82,23 @@ struct msm_gem_vm {
>
>         /** @managed: is this a kernel managed VM? */
>         bool managed;
> +
> +       /**
> +        * @unusable: True if the VM has turned unusable because something
> +        * bad happened during an asynchronous request.
> +        *
> +        * We don't try to recover from such failures, because this implies
> +        * informing userspace about the specific operation that failed, and
> +        * hoping the userspace driver can replay things from there. This all
> +        * sounds very complicated for little gain.
> +        *
> +        * Instead, we should just flag the VM as unusable, and fail any
> +        * further request targeting this VM.
> +        *
> +        * As an analogy, this would be mapped to a VK_ERROR_DEVICE_LOST
> +        * situation, where the logical device needs to be re-created.
> +        */
> +       bool unusable;
>  };
>  #define to_msm_vm(x) container_of(x, struct msm_gem_vm, base)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 9731ad7993cf..9cef308a0ad1 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -668,6 +668,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>         if (args->pad)
>                 return -EINVAL;
>
> +       if (to_msm_vm(ctx->vm)->unusable)
> +               return UERR(EPIPE, dev, "context is unusable");
> +
>         /* for now, we just have 3d pipe.. eventually this would need to
>          * be more clever to dispatch to appropriate gpu module:
>          */
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index 503e4dcc5a6f..4831f4e42fd9 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -386,8 +386,20 @@ static void recover_worker(struct kthread_work *work)
>
>         /* Increment the fault counts */
>         submit->queue->faults++;
> -       if (submit->vm)
> -               to_msm_vm(submit->vm)->faults++;
> +       if (submit->vm) {
> +               struct msm_gem_vm *vm = to_msm_vm(submit->vm);
> +
> +               vm->faults++;
> +
> +               /*
> +                * If userspace has opted-in to VM_BIND (and therefore userspace
> +                * management of the VM), faults mark the VM as unusuable.  This
> +                * matches vulkan expectations (vulkan is the main target for
> +                * VM_BIND)

The bit about this matching Vulkan expectations isn't exactly true.
Some Vulkan implementations do do this, but many will also just ignore
the fault and try to continue going, and the spec allows either. It's
a choice that we're making.

Connor

> +                */
> +               if (!vm->managed)
> +                       vm->unusable = true;
> +       }
>
>         get_comm_cmdline(submit, &comm, &cmd);
>
> --
> 2.48.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ