lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZDZntP+0wG6+QyHh@phenom.ffwll.local>
Date:   Wed, 12 Apr 2023 10:11:32 +0200
From:   Daniel Vetter <daniel@...ll.ch>
To:     Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
Cc:     Rob Clark <robdclark@...il.com>, dri-devel@...ts.freedesktop.org,
        Rob Clark <robdclark@...omium.org>,
        Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        linux-arm-msm@...r.kernel.org,
        Emil Velikov <emil.l.velikov@...il.com>,
        Christopher Healy <healych@...zon.com>,
        open list <linux-kernel@...r.kernel.org>,
        Sean Paul <sean@...rly.run>,
        Boris Brezillon <boris.brezillon@...labora.com>,
        freedreno@...ts.freedesktop.org
Subject: Re: [Freedreno] [PATCH v2 0/2] drm: fdinfo memory stats

On Wed, Apr 12, 2023 at 01:36:52AM +0300, Dmitry Baryshkov wrote:
> On 11/04/2023 21:28, Rob Clark wrote:
> > On Tue, Apr 11, 2023 at 10:36 AM Dmitry Baryshkov
> > <dmitry.baryshkov@...aro.org> wrote:
> > > 
> > > On Tue, 11 Apr 2023 at 20:13, Rob Clark <robdclark@...il.com> wrote:
> > > > 
> > > > On Tue, Apr 11, 2023 at 9:53 AM Daniel Vetter <daniel@...ll.ch> wrote:
> > > > > 
> > > > > On Tue, Apr 11, 2023 at 09:47:32AM -0700, Rob Clark wrote:
> > > > > > On Mon, Apr 10, 2023 at 2:06 PM Rob Clark <robdclark@...il.com> wrote:
> > > > > > > 
> > > > > > > From: Rob Clark <robdclark@...omium.org>
> > > > > > > 
> > > > > > > Similar motivation to other similar recent attempt[1].  But with an
> > > > > > > attempt to have some shared code for this.  As well as documentation.
> > > > > > > 
> > > > > > > It is probably a bit UMA-centric, I guess devices with VRAM might want
> > > > > > > some placement stats as well.  But this seems like a reasonable start.
> > > > > > > 
> > > > > > > Basic gputop support: https://patchwork.freedesktop.org/series/116236/
> > > > > > > And already nvtop support: https://github.com/Syllo/nvtop/pull/204
> > > > > > 
> > > > > > On a related topic, I'm wondering if it would make sense to report
> > > > > > some more global things (temp, freq, etc) via fdinfo?  Some of this,
> > > > > > tools like nvtop could get by trawling sysfs or other driver specific
> > > > > > ways.  But maybe it makes sense to have these sort of things reported
> > > > > > in a standardized way (even though they aren't really per-drm_file)
> > > > > 
> > > > > I think that's a bit much layering violation, we'd essentially have to
> > > > > reinvent the hwmon sysfs uapi in fdinfo. Not really a business I want to
> > > > > be in :-)
> > > > 
> > > > I guess this is true for temp (where there are thermal zones with
> > > > potentially multiple temp sensors.. but I'm still digging my way thru
> > > > the thermal_cooling_device stuff)
> > > 
> > > It is slightly ugly. All thermal zones and cooling devices are virtual
> > > devices (so, even no connection to the particular tsens device). One
> > > can either enumerate them by checking
> > > /sys/class/thermal/thermal_zoneN/type or enumerate them through
> > > /sys/class/hwmon. For cooling devices again the only enumeration is
> > > through /sys/class/thermal/cooling_deviceN/type.
> > > 
> > > Probably it should be possible to push cooling devices and thermal
> > > zones under corresponding providers. However I do not know if there is
> > > a good way to correlate cooling device (ideally a part of GPU) to the
> > > thermal_zone (which in our case is provided by tsens / temp_alarm
> > > rather than GPU itself).
> > > 
> > > > 
> > > > But what about freq?  I think, esp for cases where some "fw thing" is
> > > > controlling the freq we end up needing to use gpu counters to measure
> > > > the freq.
> > > 
> > > For the freq it is slightly easier: /sys/class/devfreq/*, devices are
> > > registered under proper parent (IOW, GPU). So one can read
> > > /sys/class/devfreq/3d00000.gpu/cur_freq or
> > > /sys/bus/platform/devices/3d00000.gpu/devfreq/3d00000.gpu/cur_freq.
> > > 
> > > However because of the components usage, there is no link from
> > > /sys/class/drm/card0
> > > (/sys/devices/platform/soc@...e00000.display-subsystem/ae01000.display-controller/drm/card0)
> > > to /sys/devices/platform/soc@...d00000.gpu, the GPU unit.
> > > 
> > > Getting all these items together in a platform-independent way would
> > > be definitely an important but complex topic.
> > 
> > But I don't believe any of the pci gpu's use devfreq ;-)
> > 
> > And also, you can't expect the CPU to actually know the freq when fw
> > is the one controlling freq.  We can, currently, have a reasonable
> > approximation from devfreq but that stops if IFPC is implemented.  And
> > other GPUs have even less direct control.  So freq is a thing that I
> > don't think we should try to get from "common frameworks"
> 
> I think it might be useful to add another passive devfreq governor type for
> external frequencies. This way we can use the same interface to export
> non-CPU-controlled frequencies.

Yeah this sounds like a decent idea to me too. It might also solve the fun
of various pci devices having very non-standard freq controls in sysfs
(looking at least at i915 here ...)

I guess it would minimally be a good idea if we could document this, or
maybe have a reference implementation in nvtop or whatever the cool thing
is rn.
-Daniel

> 
> > 
> > BR,
> > -R
> > 
> > > > 
> > > > > What might be needed is better glue to go from the fd or fdinfo to the
> > > > > right hw device and then crawl around the hwmon in sysfs automatically. I
> > > > > would not be surprised at all if we really suck on this, probably more
> > > > > likely on SoC than pci gpus where at least everything should be under the
> > > > > main pci sysfs device.
> > > > 
> > > > yeah, I *think* userspace would have to look at /proc/device-tree to
> > > > find the cooling device(s) associated with the gpu.. at least I don't
> > > > see a straightforward way to figure it out just for sysfs
> > > > 
> > > > BR,
> > > > -R
> > > > 
> > > > > -Daniel
> > > > > 
> > > > > > 
> > > > > > BR,
> > > > > > -R
> > > > > > 
> > > > > > 
> > > > > > > [1] https://patchwork.freedesktop.org/series/112397/
> > > > > > > 
> > > > > > > Rob Clark (2):
> > > > > > >    drm: Add fdinfo memory stats
> > > > > > >    drm/msm: Add memory stats to fdinfo
> > > > > > > 
> > > > > > >   Documentation/gpu/drm-usage-stats.rst | 21 +++++++
> > > > > > >   drivers/gpu/drm/drm_file.c            | 79 +++++++++++++++++++++++++++
> > > > > > >   drivers/gpu/drm/msm/msm_drv.c         | 25 ++++++++-
> > > > > > >   drivers/gpu/drm/msm/msm_gpu.c         |  2 -
> > > > > > >   include/drm/drm_file.h                | 10 ++++
> > > > > > >   5 files changed, 134 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > --
> > > > > > > 2.39.2
> > > > > > > 
> > > > > 
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > > 
> > > 
> > > 
> > > --
> > > With best wishes
> > > Dmitry
> 
> -- 
> With best wishes
> Dmitry
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ