lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240806-poetic-awesome-impala-fb6c2f@houat>
Date: Tue, 6 Aug 2024 10:19:56 +0200
From: Maxime Ripard <mripard@...nel.org>
To: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
Cc: intel-xe@...ts.freedesktop.org, linux-kernel@...r.kernel.org, 
	dri-devel@...ts.freedesktop.org, Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>, 
	Johannes Weiner <hannes@...xchg.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Jonathan Corbet <corbet@....net>, David Airlie <airlied@...il.com>, 
	Daniel Vetter <daniel@...ll.ch>, Thomas Zimmermann <tzimmermann@...e.de>, 
	Friedrich Vock <friedrich.vock@....de>, cgroups@...r.kernel.org, linux-mm@...ck.org, 
	linux-doc@...r.kernel.org
Subject: Re: [RFC PATCH 2/6] drm/cgroup: Add memory accounting DRM cgroup

Hi Maarten,

On Mon, Jul 01, 2024 at 11:25:12AM GMT, Maarten Lankhorst wrote:
> Den 2024-06-28 kl. 16:04, skrev Maxime Ripard:
> > On Thu, Jun 27, 2024 at 09:22:56PM GMT, Maarten Lankhorst wrote:
> >> Den 2024-06-27 kl. 19:16, skrev Maxime Ripard:
> >>> Hi,
> >>>
> >>> Thanks for working on this!
> >>>
> >>> On Thu, Jun 27, 2024 at 05:47:21PM GMT, Maarten Lankhorst wrote:
> >>>> The initial version was based roughly on the rdma and misc cgroup
> >>>> controllers, with a lot of the accounting code borrowed from rdma.
> >>>>
> >>>> The current version is a complete rewrite with page counter; it uses
> >>>> the same min/low/max semantics as the memory cgroup as a result.
> >>>>
> >>>> There's a small mismatch as TTM uses u64, and page_counter long pages.
> >>>> In practice it's not a problem. 32-bits systems don't really come with
> >>>>> =4GB cards and as long as we're consistently wrong with units, it's
> >>>> fine. The device page size may not be in the same units as kernel page
> >>>> size, and each region might also have a different page size (VRAM vs GART
> >>>> for example).
> >>>>
> >>>> The interface is simple:
> >>>> - populate drmcgroup_device->regions[..] name and size for each active
> >>>>    region, set num_regions accordingly.
> >>>> - Call drm(m)cg_register_device()
> >>>> - Use drmcg_try_charge to check if you can allocate a chunk of memory,
> >>>>    use drmcg_uncharge when freeing it. This may return an error code,
> >>>>    or -EAGAIN when the cgroup limit is reached. In that case a reference
> >>>>    to the limiting pool is returned.
> >>>> - The limiting cs can be used as compare function for
> >>>>    drmcs_evict_valuable.
> >>>> - After having evicted enough, drop reference to limiting cs with
> >>>>    drmcs_pool_put.
> >>>>
> >>>> This API allows you to limit device resources with cgroups.
> >>>> You can see the supported cards in /sys/fs/cgroup/drm.capacity
> >>>> You need to echo +drm to cgroup.subtree_control, and then you can
> >>>> partition memory.
> >>>>
> >>>> Signed-off-by: Maarten Lankhorst<maarten.lankhorst@...ux.intel.com>
> >>>> Co-developed-by: Friedrich Vock<friedrich.vock@....de>
> >>> I'm sorry, I should have wrote minutes on the discussion we had with TJ
> >>> and Tvrtko the other day.
> >>>
> >>> We're all very interested in making this happen, but doing a "DRM"
> >>> cgroup doesn't look like the right path to us.
> >>>
> >>> Indeed, we have a significant number of drivers that won't have a
> >>> dedicated memory but will depend on DMA allocations one way or the
> >>> other, and those pools are shared between multiple frameworks (DRM,
> >>> V4L2, DMA-Buf Heaps, at least).
> >>>
> >>> This was also pointed out by Sima some time ago here:
> >>> https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
> >>>
> >>> So we'll want that cgroup subsystem to be cross-framework. We settled on
> >>> a "device" cgroup during the discussion, but I'm sure we'll have plenty
> >>> of bikeshedding.
> >>>
> >>> The other thing we agreed on, based on the feedback TJ got on the last
> >>> iterations of his series was to go for memcg for drivers not using DMA
> >>> allocations.
> >>>
> >>> It's the part where I expect some discussion there too :)
> >>>
> >>> So we went back to a previous version of TJ's work, and I've started to
> >>> work on:
> >>>
> >>>    - Integration of the cgroup in the GEM DMA and GEM VRAM helpers (this
> >>>      works on tidss right now)
> >>>
> >>>    - Integration of all heaps into that cgroup but the system one
> >>>      (working on this at the moment)
> >>
> >> Should be similar to what I have then. I think you could use my work to
> >> continue it.
> >>
> >> I made nothing DRM specific except the name, if you renamed it the device
> >> resource management cgroup and changed the init function signature to take a
> >> name instead of a drm pointer, nothing would change. This is exactly what
> >> I'm hoping to accomplish, including reserving memory.
> > 
> > I've started to work on rebasing my current work onto your series today,
> > and I'm not entirely sure how what I described would best fit. Let's
> > assume we have two KMS device, one using shmem, one using DMA
> > allocations, two heaps, one using the page allocator, the other using
> > CMA, and one v4l2 device using dma allocations.
> > 
> > So we would have one KMS device and one heap using the page allocator,
> > and one KMS device, one heap, and one v4l2 driver using the DMA
> > allocator.
> > 
> > Would these make different cgroup devices, or different cgroup regions?
> 
> Each driver would register a device, whatever feels most logical for
> that device I suppose.
> 
> My guess is that a prefix would also be nice here, so register a
> device with name of drm/$name or v4l2/$name, heap/$name. I didn't give
> it much thought and we're still experimenting, so just try something.
> :)
> 
> There's no limit to amount of devices, I only fixed amount of pools to
> match TTM, but even that could be increased arbitrarily. I just don't
> think there is a point in doing so.

Sorry, it took a while, but I implemented what (I think) we all had in
mind here:

https://github.com/mripard/linux/tree/device-cgroups-maarten

It's rebased on top of 6.11, and with plenty of fixups to (hopefully :D)
make your life easier.

Let me know what you think,
Maxime

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ