[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aWqe_ulADqRZBQj_@intel.com>
Date: Fri, 16 Jan 2026 15:26:38 -0500
From: Rodrigo Vivi <rodrigo.vivi@...el.com>
To: Riana Tauro <riana.tauro@...el.com>
CC: Zack McKevitt <zachary.mckevitt@....qualcomm.com>, Jakub Kicinski
<kuba@...nel.org>, <intel-xe@...ts.freedesktop.org>,
<dri-devel@...ts.freedesktop.org>, <aravind.iddamsetty@...ux.intel.com>,
<anshuman.gupta@...el.com>, <joonas.lahtinen@...ux.intel.com>,
<lukas@...ner.de>, <simona.vetter@...ll.ch>, <airlied@...il.com>,
<pratik.bari@...el.com>, <joshua.santosh.ranjan@...el.com>,
<ashwin.kumar.kulkarni@...el.com>, <shubham.kumar@...el.com>, Lijo Lazar
<lijo.lazar@....com>, Hawking Zhang <Hawking.Zhang@....com>, "David S.
Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, Eric Dumazet
<edumazet@...gle.com>, <netdev@...r.kernel.org>, Jeff Hugo
<jeff.hugo@....qualcomm.com>
Subject: Re: [PATCH v3 1/4] drm/ras: Introduce the DRM RAS infrastructure
over generic netlink
On Fri, Jan 16, 2026 at 11:26:36AM +0530, Riana Tauro wrote:
>
>
> On 1/16/2026 5:09 AM, Zack McKevitt wrote:
> >
> >
> > On 1/13/2026 1:20 AM, Riana Tauro wrote:
> > > > > > > diff --git
> > > > > > > a/Documentation/netlink/specs/drm_ras.yaml b/
> > > > > > > Documentation/netlink/specs/drm_ras.yaml
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..be0e379c5bc9
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/netlink/specs/drm_ras.yaml
> > > > > > > @@ -0,0 +1,130 @@
> > > > > > > +# SPDX-License-Identifier: ((GPL-2.0 WITH
> > > > > > > Linux-syscall-note) OR BSD-3-Clause)
> > > > > > > +---
> > > > > > > +name: drm-ras
> > > > > > > +protocol: genetlink
> > > > > > > +uapi-header: drm/drm_ras.h
> > > > > > > +
> > > > > > > +doc: >-
> > > > > > > + DRM RAS (Reliability, Availability,
> > > > > > > Serviceability) over Generic Netlink.
> > > > > > > + Provides a standardized mechanism for DRM drivers
> > > > > > > to register "nodes"
> > > > > > > + representing hardware/software components capable
> > > > > > > of reporting error counters.
> > > > > > > + Userspace tools can query the list of nodes or
> > > > > > > individual error counters
> > > > > > > + via the Generic Netlink interface.
> > > > > > > +
> > > > > > > +definitions:
> > > > > > > + -
> > > > > > > + type: enum
> > > > > > > + name: node-type
> > > > > > > + value-start: 1
> > > > > > > + entries: [error-counter]
> > > > > > > + doc: >-
> > > > > > > + Type of the node. Currently, only error-counter nodes are
> > > > > > > + supported, which expose reliability
> > > > > > > counters for a hardware/software
> > > > > > > + component.
> > > > > > > +
> > > > > > > +attribute-sets:
> > > > > > > + -
> > > > > > > + name: node-attrs
> > > > > > > + attributes:
> > > > > > > + -
> > > > > > > + name: node-id
> > > > > > > + type: u32
> > > > > > > + doc: >-
> > > > > > > + Unique identifier for the node.
> > > > > > > + Assigned dynamically by the DRM RAS
> > > > > > > core upon registration.
> > > > > > > + -
> > > > > > > + name: device-name
> > > > > > > + type: string
> > > > > > > + doc: >-
> > > > > > > + Device name chosen by the driver at registration.
> > > > > > > + Can be a PCI BDF, UUID, or module name if unique.
> > > > > > > + -
> > > > > > > + name: node-name
> > > > > > > + type: string
> > > > > > > + doc: >-
> > > > > > > + Node name chosen by the driver at registration.
> > > > > > > + Can be an IP block name, or any name
> > > > > > > that identifies the
> > > > > > > + RAS node inside the device.
> > > > > > > + -
> > > > > > > + name: node-type
> > > > > > > + type: u32
> > > > > > > + doc: Type of this node, identifying its function.
> > > > > > > + enum: node-type
> > > > > > > + -
> > > > > > > + name: error-counter-attrs
> > > > > > > + attributes:
> > > > > > > + -
> > > > > > > + name: node-id
> > > > > > > + type: u32
> > > > > > > + doc: Node ID targeted by this error counter operation.
> > > > > > > + -
> > > > > > > + name: error-id
> > > > > > > + type: u32
> > > > > > > + doc: Unique identifier for a specific error
> > > > > > > counter within an node.
> > > > > > > + -
> > > > > > > + name: error-name
> > > > > > > + type: string
> > > > > > > + doc: Name of the error.
> > > > > > > + -
> > > > > > > + name: error-value
> > > > > > > + type: u32
> > > > > > > + doc: Current value of the requested error counter.
> > > > > > > +
> > > > > > > +operations:
> > > > > > > + list:
> > > > > > > + -
> > > > > > > + name: list-nodes
> > > > > > > + doc: >-
> > > > > > > + Retrieve the full list of currently
> > > > > > > registered DRM RAS nodes.
> > > > > > > + Each node includes its dynamically
> > > > > > > assigned ID, name, and type.
> > > > > > > + **Important:** User space must call this
> > > > > > > operation first to obtain
> > > > > > > + the node IDs. These IDs are required for all subsequent
> > > > > > > + operations on nodes, such as querying error counters.
> > > > >
> > > > > I am curious about security implications of this design.
> > > >
> > > > hmm... very good point you are raising here.
> > > >
> > > > This current design relies entirely in the CAP_NET_ADMIN.
> > > > No driver would have the flexibility to choose anything differently.
> > > > Please notice that the flag admin-perm is hardcoded in this yaml file.
> > > >
> > > > > If the complete
> > > > > list of RAS nodes is visible for any process on the system
> > > > > (and one wants to
> > > > > avoid requiring CAP_NET_ADMIN), there should be some way to enforce
> > > > > permission checks when performing these operations if desired.
> > > >
> > > > Right now, there's no way that the driver would choose not avoid
> > > > requiring
> > > > CAP_NET_ADMIN...
> > > >
> > > > Only way would be the admin to give the cap_net_admin to the tool with:
> > > >
> > > > $ sudo setcap cap_net_admin+ep /bin/drm_ras_tool
> > > >
> > > > but not ideal and not granular anyway...
> > > >
> > > > >
> > > > > For example, this might be implemented in the driver's definition of
> > > > > callback functions like query_error_counter; some drivers
> > > > > may want to ensure
> > > > > that the process can in fact open the file descriptor
> > > > > corresponding to the
> > > > > queried device before serving a netlink request. Is it
> > > > > enough for a driver
> > > > > to simply return -EPERM in this case? Any driver that doesnt
> > > > > wish to protect
> > > > > its RAS nodes need not implement checks in their callbacks.
> > > >
> > > > Fair enough. If we want to give the option to the drivers, then we need:
> > > >
> > > > 1. to first remove all the admin-perm flags below and leave the
> > > > driver to
> > > > pick up their policy on when to return something or -EPERM.
> > > > 2. Document this security responsibility and list a few possibilities.
> > > > 3. In our Xe case here I believe the easiest option is to use
> > > > something like:
> > > >
> > > > struct scm_creds *creds = NETLINK_CREDS(cb->skb);
> > > > if (!gid_eq(creds->gid, GLOBAL_ROOT_GID))
> > > > return -EPERM
> > >
> > > The driver currently does not have access to the callback or the
> > > skbuffer. Sending these details as param to driver won't be right as
> > > drm_ras needs to handle all the netlink buffers.
> > >
> > > How about using pre_doit & start calls? If driver has a pre callback
> > > , it's the responsibility of the driver to check permissions/any-pre
> > > conditions, else the CAP_NET_ADMIN permission will be checked.
> > >
> > > @Zack / @Rodrigo thoughts?
> > > @Zack Will this work for your usecase?
> > >
> > > yaml
> > > + dump:
> > > + pre: drm-ras-nl-pre-list-nodes
> > >
> > >
> > > drm_ras.c :
> > >
> > > + if (node->pre_list_nodes)
> > > + return node->pre_list_nodes(node);
> > > +
> > > + return check_permissions(cb->skb); //Checks creds
> > >
> > > Thanks
> > > Riana
> > >
> >
> > I agree that a pre_doit is probably the best solution for this.
> >
> > Not entirely sure what a driver specific implementation would look like
> > yet, but I think that as long as the driver callback has a way to access
> > the 'current' task_struct pointer corresponding to the user process then
> > this approach seems like the best option from the netlink side.
> >
> > Since this is mostly a concern for our specific use case, perhaps we can
> > incorporate this functionality in our change down the road when we
> > expand the interface for telemetry?
Yes, as it can be changed transparently, let's do that...
>
>
> Yeah using pre_doit we can allow driver to make decisions based on
> the private data or driver specific permissions. However we will need to
> check the CAP_NET_ADMIN when no driver callback is implemented in the
> netlink layer as a default .
>
> Thank you, you can incorporate the changes when you add telemetry nodes.
>
> For now, I will retain the admin-perm in flags.
Cool then, when they come with their case we remove it and force in the
pre_doit as well.
ack..
>
> I will address the rest of the review comments and send out a new revision
> shortly.
>
> Thank you
> Riana
>
>
> >
> > Let me know what you think.
> >
> > Zack
> >
> > > >
> > > > or something like that?!
> > > >
> > > > perhaps drivers could implement some form of cookie or pre-
> > > > authorization with
> > > > ioctls or sysfs, and then store in the priv?
> > > >
> > > > Thoughts?
> > > > Other options?
> > > >
> > > > >
> > > > > I dont see any such permissions checks in your driver
> > > > > implementation which
> > > > > is understandable given that it may not be necessary for
> > > > > your use cases.
> > > > > However, this would be a concern for our driver if we were
> > > > > to adopt this
> > > > > interface.
> > > >
> > > > yeap, this case was entirely with admin-perm, so not needed at all...
> > > > But I see your point and this is really not giving any flexibility to
> > > > other drivers.
> > > >
> >
> > > > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Zack
> > > > >
> > >
> >
>
Powered by blists - more mailing lists