lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0f70fa7b-c6fa-4bb5-8f33-c40e9cfaaa80@ursulin.net>
Date: Tue, 7 Jan 2025 10:43:34 +0000
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Adrián Martínez Larumbe
 <adrian.larumbe@...labora.com>
Cc: Boris Brezillon <boris.brezillon@...labora.com>,
 Steven Price <steven.price@....com>, Liviu Dudau <liviu.dudau@....com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 kernel@...labora.com, dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org, Mihail Atanassov <mihail.atanassov@....com>
Subject: Re: [PATCH v5 2/2] Documentation/gpu: Add fdinfo meanings of
 drm-*-internal memory tags


On 06/01/2025 16:53, Adrián Martínez Larumbe wrote:
> On 03.01.2025 10:49, Tvrtko Ursulin wrote:
>> On 02/01/2025 22:18, Adrián Martínez Larumbe wrote:
>>> On 02.01.2025 21:59, Tvrtko Ursulin wrote:
>>>>
>>>> On 18/12/2024 18:18, Adrián Martínez Larumbe wrote:
>>>>> From: Adrián Larumbe <adrian.larumbe@...labora.com>
>>>>>
>>>>> A previous commit enabled display of driver-internal kernel BO sizes
>>>>> through the device file's fdinfo interface.
>>>>>
>>>>> Expand the description of the relevant driver-specific key:value pairs
>>>>> with the definitions of the new drm-*-internal ones.
>>>>>
>>>>> Signed-off-by: Adrián Larumbe <adrian.larumbe@...labora.com>
>>>>> Reviewed-by: Mihail Atanassov <mihail.atanassov@....com>
>>>>> ---
>>>>>     Documentation/gpu/panthor.rst | 14 ++++++++++++++
>>>>>     1 file changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/gpu/panthor.rst b/Documentation/gpu/panthor.rst
>>>>> index 3f8979fa2b86..23aa3d67c9d2 100644
>>>>> --- a/Documentation/gpu/panthor.rst
>>>>> +++ b/Documentation/gpu/panthor.rst
>>>>> @@ -26,6 +26,10 @@ the currently possible format options:
>>>>>          drm-cycles-panthor:     94439687187
>>>>>          drm-maxfreq-panthor:    1000000000 Hz
>>>>>          drm-curfreq-panthor:    1000000000 Hz
>>>>> +     drm-total-internal:     10396 KiB
>>>>> +     drm-shared-internal:    0
>>>>> +     drm-active-internal:    10396 KiB
>>>>> +     drm-resident-internal:  10396 KiB
>>>>>          drm-total-memory:       16480 KiB
>>>>>          drm-shared-memory:      0
>>>>>          drm-active-memory:      16200 KiB
>>>>> @@ -44,3 +48,13 @@ driver by writing into the appropriate sysfs node::
>>>>>     Where `N` is a bit mask where cycle and timestamp sampling are respectively
>>>>>     enabled by the first and second bits.
>>>>> +
>>>>> +Possible `drm-*-internal` keys are: `total`, `active`, `resident` and `shared`.
>>>>> +These values convey the sizes of the internal driver-owned shmem BO's that
>>>>> +aren't exposed to user-space through a DRM handle, like queue ring buffers,
>>>>> +sync object arrays and heap chunks. Because they are all allocated and pinned
>>>>> +at creation time, `drm-resident-internal` and `drm-total-internal` should always
>>>>> +be equal. `drm-active-internal` shows the size of kernel BO's associated with
>>>>> +VM's and groups currently being scheduled for execution by the GPU.
>>>>> +`drm-shared-internal` is unused at present, but in the future it might stand for
>>>>> +the size of executable FW regions, since they do not belong to an open file context.
>>>>
>>>> The description is way too specific, too tied to some of the implementations.
>>>
>>> These are panthor-specific key:value pairs. I was in the belief that drivers
>>> could define their own when it suits their interest beyond the DRM-wide ones
>>> defined in the drm-fdinfo spec.
>>>
>>>> I also don't remember that you ever explained why totting up the internal
>>>> objects into existing regions isn't good enough. I keep asking, you keep not
>>>> explaining. Or I missed your emails somehow.
>>>
>>> It's not that it's not good enough, but rather that it cannot be done in the
>>> current state of affairs. drm_show_memory_stats() defines its own
>>> drm_memory_stats struct as an automatic variable so we don't have access to it
>>> from anywhere else in the driver. In a previous revision of the patch series I
>>> had come up with a workaround that would let drivers pass a function pointer to
>>> drm_show_memory_stats() which would gather those numbers in a driver-specific
>>> way, but it didn't seem to get any traction.
>>
>> Side note - i915 and amdgpu manage to do it so it is not that it is not
>> possible.
>>
>>>> And you keep not copying me on the thread. Copying people who expressed
>>>> interest, gave past feedback, etc should be the norm.
>>>
>>> I did not CC you on this series because these are all panthor-specific changes
>>> which do not touch on any DRM fdinfo-wide code, and also because I didn't think
>>> that driver-specific key:value pairs needed the approval of the drm-fdinfo core
>>> maintainers.
>>
>> Ah my bad.. sorry! I saw drm-internal-* and did not spot it is actually *in*
>> panthor.rst. So I think you just need to rename those to panthor- prefix. Same
>> as amdgpu has its own private keys amd-evicted-vram etc.
> 
> This complicates things because that means I can no longer use
> drm_print_memory_stat(), since print_size() will prefix every single key with

Maybe you even shouldn't because it does not seem to fit that well? For 
instance you define total and resident must always match. And shared is 
unused. So why expose the duplicate keys to start with? You will not be 
able to change it later and keep userspace compatibility. And keys like 
shared "might be used for X in the future" is also not very useful. What 
does userspace do with those when parsing? It cannot make a decision.

> "drm-". But then not using print_size() means I'm giving up on the nice unit
> size selection loop, which I guess I could just copy and paste inside Panthor,
> but I do remember a recent patch series where the unit selection criteria
> changed slightly so wouldn't like to keep these manually sync'd.
> 
> There's also the thing that the units I'm displaying here match up nicely with
> those representing the size of DRM objects with a UM-facing handle, so crafting
> my own function to display these when the only difference is a single key prefix
> seems like an overkill. I guess drm_print_memory_stats() and the functions
> underneath should offer more flexibility, but I guess that's something that can
> be discussed in a later patch series.

What you describe here could be just some refactoring is needed rather 
than being a huge problem. Like export a new helper from the code which 
takes the prefix as argument.

> And then there's the following statement in Documentation/gpu/drm-usage-stats.rst:24:
> 
> "- All keys shall be prefixed with `drm-`."
> 
> It doesn't say Driver-specific keys should begin with the name of the driver. In
> fact, it seems neither AMD nor Intel drivers have theirs documented.
> 
> In light of all this, I'd much rather not modify the names of Panthor-specific
> fdinfo's internal memory keys.

Amdgpu indeed fails to document its specific keys but some lenience 
there since it predated the standardisation and a patch can be submitted 
to fix that. I've been making some changes too to make it use more of 
common keys and helpers.

No other drivers appear to have driver specific keys at this point. 
Which ones do you see on Intel side?

If indeed there are none then that leaves the question of what 
drm-usage-stats.rst means when it says:

- All keys shall be prefixed with `drm-`.

We could for instance clarify that applies to common keys and that the 
driver specific keys should use a different prefix.

To me it feels like doing that (clarifying documentation) and tweaking 
your series to add some core helpers you could use would lead to the 
better end result.

Regards,

Tvrtko

>>>> Until we can clarify the above points I don't think this can go in.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>
>>> Adrian Larumbe
> 
> Adrian Larumbe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ