netdev - Re: [PATCH v3 1/4] drm/ras: Introduce the DRM RAS infrastructure over generic netlink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <919c0b7e-83d7-45e8-ae96-d9fb7a10995c@oss.qualcomm.com>
Date: Thu, 15 Jan 2026 16:39:02 -0700
From: Zack McKevitt <zachary.mckevitt@....qualcomm.com>
To: Riana Tauro <riana.tauro@...el.com>, Rodrigo Vivi <rodrigo.vivi@...el.com>
Cc: Jakub Kicinski <kuba@...nel.org>, intel-xe@...ts.freedesktop.org,
        dri-devel@...ts.freedesktop.org, aravind.iddamsetty@...ux.intel.com,
        anshuman.gupta@...el.com, joonas.lahtinen@...ux.intel.com,
        lukas@...ner.de, simona.vetter@...ll.ch, airlied@...il.com,
        pratik.bari@...el.com, joshua.santosh.ranjan@...el.com,
        ashwin.kumar.kulkarni@...el.com, shubham.kumar@...el.com,
        Lijo Lazar <lijo.lazar@....com>, Hawking Zhang <Hawking.Zhang@....com>,
        "David S. Miller"
 <davem@...emloft.net>,
        Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
        netdev@...r.kernel.org, Jeff Hugo <jeff.hugo@....qualcomm.com>
Subject: Re: [PATCH v3 1/4] drm/ras: Introduce the DRM RAS infrastructure over
 generic netlink



On 1/13/2026 1:20 AM, Riana Tauro wrote:
>>>>> diff --git a/Documentation/netlink/specs/drm_ras.yaml b/ 
>>>>> Documentation/netlink/specs/drm_ras.yaml
>>>>> new file mode 100644
>>>>> index 000000000000..be0e379c5bc9
>>>>> --- /dev/null
>>>>> +++ b/Documentation/netlink/specs/drm_ras.yaml
>>>>> @@ -0,0 +1,130 @@
>>>>> +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
>>>>> BSD-3-Clause)
>>>>> +---
>>>>> +name: drm-ras
>>>>> +protocol: genetlink
>>>>> +uapi-header: drm/drm_ras.h
>>>>> +
>>>>> +doc: >-
>>>>> +  DRM RAS (Reliability, Availability, Serviceability) over Generic 
>>>>> Netlink.
>>>>> +  Provides a standardized mechanism for DRM drivers to register 
>>>>> "nodes"
>>>>> +  representing hardware/software components capable of reporting 
>>>>> error counters.
>>>>> +  Userspace tools can query the list of nodes or individual error 
>>>>> counters
>>>>> +  via the Generic Netlink interface.
>>>>> +
>>>>> +definitions:
>>>>> +  -
>>>>> +    type: enum
>>>>> +    name: node-type
>>>>> +    value-start: 1
>>>>> +    entries: [error-counter]
>>>>> +    doc: >-
>>>>> +         Type of the node. Currently, only error-counter nodes are
>>>>> +         supported, which expose reliability counters for a 
>>>>> hardware/software
>>>>> +         component.
>>>>> +
>>>>> +attribute-sets:
>>>>> +  -
>>>>> +    name: node-attrs
>>>>> +    attributes:
>>>>> +      -
>>>>> +        name: node-id
>>>>> +        type: u32
>>>>> +        doc: >-
>>>>> +             Unique identifier for the node.
>>>>> +             Assigned dynamically by the DRM RAS core upon 
>>>>> registration.
>>>>> +      -
>>>>> +        name: device-name
>>>>> +        type: string
>>>>> +        doc: >-
>>>>> +             Device name chosen by the driver at registration.
>>>>> +             Can be a PCI BDF, UUID, or module name if unique.
>>>>> +      -
>>>>> +        name: node-name
>>>>> +        type: string
>>>>> +        doc: >-
>>>>> +             Node name chosen by the driver at registration.
>>>>> +             Can be an IP block name, or any name that identifies the
>>>>> +             RAS node inside the device.
>>>>> +      -
>>>>> +        name: node-type
>>>>> +        type: u32
>>>>> +        doc: Type of this node, identifying its function.
>>>>> +        enum: node-type
>>>>> +  -
>>>>> +    name: error-counter-attrs
>>>>> +    attributes:
>>>>> +      -
>>>>> +        name: node-id
>>>>> +        type: u32
>>>>> +        doc:  Node ID targeted by this error counter operation.
>>>>> +      -
>>>>> +        name: error-id
>>>>> +        type: u32
>>>>> +        doc: Unique identifier for a specific error counter within 
>>>>> an node.
>>>>> +      -
>>>>> +        name: error-name
>>>>> +        type: string
>>>>> +        doc: Name of the error.
>>>>> +      -
>>>>> +        name: error-value
>>>>> +        type: u32
>>>>> +        doc: Current value of the requested error counter.
>>>>> +
>>>>> +operations:
>>>>> +  list:
>>>>> +    -
>>>>> +      name: list-nodes
>>>>> +      doc: >-
>>>>> +           Retrieve the full list of currently registered DRM RAS 
>>>>> nodes.
>>>>> +           Each node includes its dynamically assigned ID, name, 
>>>>> and type.
>>>>> +           **Important:** User space must call this operation 
>>>>> first to obtain
>>>>> +           the node IDs. These IDs are required for all subsequent
>>>>> +           operations on nodes, such as querying error counters.
>>>
>>> I am curious about security implications of this design.
>>
>> hmm... very good point you are raising here.
>>
>> This current design relies entirely in the CAP_NET_ADMIN.
>> No driver would have the flexibility to choose anything differently.
>> Please notice that the flag admin-perm is hardcoded in this yaml file.
>>
>>> If the complete
>>> list of RAS nodes is visible for any process on the system (and one 
>>> wants to
>>> avoid requiring CAP_NET_ADMIN), there should be some way to enforce
>>> permission checks when performing these operations if desired.
>>
>> Right now, there's no way that the driver would choose not avoid 
>> requiring
>> CAP_NET_ADMIN...
>>
>> Only way would be the admin to give the cap_net_admin to the tool with:
>>
>> $ sudo setcap cap_net_admin+ep /bin/drm_ras_tool
>>
>> but not ideal and not granular anyway...
>>
>>>
>>> For example, this might be implemented in the driver's definition of
>>> callback functions like query_error_counter; some drivers may want to 
>>> ensure
>>> that the process can in fact open the file descriptor corresponding 
>>> to the
>>> queried device before serving a netlink request. Is it enough for a 
>>> driver
>>> to simply return -EPERM in this case? Any driver that doesnt wish to 
>>> protect
>>> its RAS nodes need not implement checks in their callbacks.
>>
>> Fair enough. If we want to give the option to the drivers, then we need:
>>
>> 1. to first remove all the admin-perm flags below and leave the driver to
>> pick up their policy on when to return something or -EPERM.
>> 2. Document this security responsibility and list a few possibilities.
>> 3. In our Xe case here I believe the easiest option is to use 
>> something like:
>>
>> struct scm_creds *creds = NETLINK_CREDS(cb->skb);
>> if (!gid_eq(creds->gid, GLOBAL_ROOT_GID))
>>      return -EPERM
> 
> The driver currently does not have access to the callback or the 
> skbuffer. Sending these details as param to driver won't be right as
> drm_ras needs to handle all the netlink buffers.
> 
> How about using pre_doit & start calls? If driver has a pre callback , 
> it's the responsibility of the driver to check permissions/any-pre 
> conditions, else the CAP_NET_ADMIN permission will be checked.
> 
> @Zack / @Rodrigo thoughts?
> @Zack Will this work for your usecase?
> 
> yaml
> +    dump:
> +        pre: drm-ras-nl-pre-list-nodes
> 
> 
> drm_ras.c :
> 
> +       if (node->pre_list_nodes)
> +                return node->pre_list_nodes(node);
> +
> +        return check_permissions(cb->skb);  //Checks creds
> 
> Thanks
> Riana
> 

I agree that a pre_doit is probably the best solution for this.

Not entirely sure what a driver specific implementation would look like 
yet, but I think that as long as the driver callback has a way to access 
the 'current' task_struct pointer corresponding to the user process then 
this approach seems like the best option from the netlink side.

Since this is mostly a concern for our specific use case, perhaps we can 
incorporate this functionality in our change down the road when we 
expand the interface for telemetry?

Let me know what you think.

Zack

>>
>> or something like that?!
>>
>> perhaps drivers could implement some form of cookie or pre- 
>> authorization with
>> ioctls or sysfs, and then store in the priv?
>>
>> Thoughts?
>> Other options?
>>
>>>
>>> I dont see any such permissions checks in your driver implementation 
>>> which
>>> is understandable given that it may not be necessary for your use cases.
>>> However, this would be a concern for our driver if we were to adopt this
>>> interface.
>>
>> yeap, this case was entirely with admin-perm, so not needed at all...
>> But I see your point and this is really not giving any flexibility to
>> other drivers.
>>

>>>>>
>>>
>>> Thanks,
>>>
>>> Zack
>>>
>