lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Jul 2023 15:08:40 +0100
From:   Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
        Johannes Weiner <hannes@...xchg.org>,
        Zefan Li <lizefan.x@...edance.com>,
        Dave Airlie <airlied@...hat.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Rob Clark <robdclark@...omium.org>,
        Stéphane Marchesin <marcheu@...omium.org>,
        "T . J . Mercier" <tjmercier@...gle.com>, Kenny.Ho@....com,
        Christian König <christian.koenig@....com>,
        Brian Welty <brian.welty@...el.com>,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        Eero Tamminen <eero.t.tamminen@...el.com>
Subject: Re: [PATCH 15/17] cgroup/drm: Expose GPU utilisation


On 21/07/2023 23:20, Tejun Heo wrote:
> On Fri, Jul 21, 2023 at 12:19:32PM -1000, Tejun Heo wrote:
>> On Wed, Jul 12, 2023 at 12:46:03PM +0100, Tvrtko Ursulin wrote:
>>> +  drm.active_us
>>> +	GPU time used by the group recursively including all child groups.
>>
>> Maybe instead add drm.stat and have "usage_usec" inside? That'd be more
>> consistent with cpu side.

Could be, but no strong opinion from my side either way. Perhaps it boils down to what could be put in the file, I mean to decide whether keyed format makes sense or not.
  
> Also, shouldn't this be keyed by the drm device?
  
It could have that too, or it could come later. Fun with GPUs that it not only could be keyed by the device, but also by the type of the GPU engine. (Which are a) vendor specific and b) some aree fully independent, some partially so, and some not at all - so it could get complicated semantics wise really fast.)

If for now I'd go with drm.stat/usage_usec containing the total time spent how would you suggest adding per device granularity? Files as documented are either flag or nested, not both at the same time. So something like:

usage_usec 100000
card0 usage_usec 50000
card1 usage_usec 50000

Would or would not fly? Have two files along the lines of drm.stat and drm.dev_stat?

While on this general topic, you will notice that for memory stats I have _sort of_ nested keyed per device format, for example on integrated Intel GPU:

   $ cat drm.memory.stat
   card0 region=system total=12898304 shared=0 active=0 resident=12111872 purgeable=167936
   card0 region=stolen-system total=0 shared=0 active=0 resident=0 purgeable=0

If one a discrete Intel GPU two more lines would appear with memory regions of local and local-system. But then on some server class multi-tile GPUs even further regions with more than one device local memory region. And users do want to see this granularity for container use cases at least.

Anyway, this may not be compatible with the nested key format as documented in cgroup-v2.rst, although it does not explicitly say.

Should I cheat and create key names based on device and memory region name and let userspace parse it? Like:

   $ cat drm.memory.stat
   card0.system total=12898304 shared=0 active=0 resident=12111872 purgeable=167936
   card0.stolen-system total=0 shared=0 active=0 resident=0 purgeable=0

Regards,

Tvrtko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ