[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <70289c58-7947-4347-8600-658821a730b0@linux.intel.com>
Date: Mon, 1 Jul 2024 11:25:12 +0200
From: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
To: Maxime Ripard <mripard@...nel.org>
Cc: intel-xe@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
dri-devel@...ts.freedesktop.org, Tejun Heo <tj@...nel.org>,
Zefan Li <lizefan.x@...edance.com>, Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>, Jonathan Corbet <corbet@....net>,
David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
Thomas Zimmermann <tzimmermann@...e.de>,
Friedrich Vock <friedrich.vock@....de>, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-doc@...r.kernel.org
Subject: Re: [RFC PATCH 2/6] drm/cgroup: Add memory accounting DRM cgroup
Den 2024-06-28 kl. 16:04, skrev Maxime Ripard:
> Hi,
>
> On Thu, Jun 27, 2024 at 09:22:56PM GMT, Maarten Lankhorst wrote:
>> Den 2024-06-27 kl. 19:16, skrev Maxime Ripard:
>>> Hi,
>>>
>>> Thanks for working on this!
>>>
>>> On Thu, Jun 27, 2024 at 05:47:21PM GMT, Maarten Lankhorst wrote:
>>>> The initial version was based roughly on the rdma and misc cgroup
>>>> controllers, with a lot of the accounting code borrowed from rdma.
>>>>
>>>> The current version is a complete rewrite with page counter; it uses
>>>> the same min/low/max semantics as the memory cgroup as a result.
>>>>
>>>> There's a small mismatch as TTM uses u64, and page_counter long pages.
>>>> In practice it's not a problem. 32-bits systems don't really come with
>>>>> =4GB cards and as long as we're consistently wrong with units, it's
>>>> fine. The device page size may not be in the same units as kernel page
>>>> size, and each region might also have a different page size (VRAM vs GART
>>>> for example).
>>>>
>>>> The interface is simple:
>>>> - populate drmcgroup_device->regions[..] name and size for each active
>>>> region, set num_regions accordingly.
>>>> - Call drm(m)cg_register_device()
>>>> - Use drmcg_try_charge to check if you can allocate a chunk of memory,
>>>> use drmcg_uncharge when freeing it. This may return an error code,
>>>> or -EAGAIN when the cgroup limit is reached. In that case a reference
>>>> to the limiting pool is returned.
>>>> - The limiting cs can be used as compare function for
>>>> drmcs_evict_valuable.
>>>> - After having evicted enough, drop reference to limiting cs with
>>>> drmcs_pool_put.
>>>>
>>>> This API allows you to limit device resources with cgroups.
>>>> You can see the supported cards in /sys/fs/cgroup/drm.capacity
>>>> You need to echo +drm to cgroup.subtree_control, and then you can
>>>> partition memory.
>>>>
>>>> Signed-off-by: Maarten Lankhorst<maarten.lankhorst@...ux.intel.com>
>>>> Co-developed-by: Friedrich Vock<friedrich.vock@....de>
>>> I'm sorry, I should have wrote minutes on the discussion we had with TJ
>>> and Tvrtko the other day.
>>>
>>> We're all very interested in making this happen, but doing a "DRM"
>>> cgroup doesn't look like the right path to us.
>>>
>>> Indeed, we have a significant number of drivers that won't have a
>>> dedicated memory but will depend on DMA allocations one way or the
>>> other, and those pools are shared between multiple frameworks (DRM,
>>> V4L2, DMA-Buf Heaps, at least).
>>>
>>> This was also pointed out by Sima some time ago here:
>>> https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
>>>
>>> So we'll want that cgroup subsystem to be cross-framework. We settled on
>>> a "device" cgroup during the discussion, but I'm sure we'll have plenty
>>> of bikeshedding.
>>>
>>> The other thing we agreed on, based on the feedback TJ got on the last
>>> iterations of his series was to go for memcg for drivers not using DMA
>>> allocations.
>>>
>>> It's the part where I expect some discussion there too :)
>>>
>>> So we went back to a previous version of TJ's work, and I've started to
>>> work on:
>>>
>>> - Integration of the cgroup in the GEM DMA and GEM VRAM helpers (this
>>> works on tidss right now)
>>>
>>> - Integration of all heaps into that cgroup but the system one
>>> (working on this at the moment)
>>
>> Should be similar to what I have then. I think you could use my work to
>> continue it.
>>
>> I made nothing DRM specific except the name, if you renamed it the device
>> resource management cgroup and changed the init function signature to take a
>> name instead of a drm pointer, nothing would change. This is exactly what
>> I'm hoping to accomplish, including reserving memory.
>
> I've started to work on rebasing my current work onto your series today,
> and I'm not entirely sure how what I described would best fit. Let's
> assume we have two KMS device, one using shmem, one using DMA
> allocations, two heaps, one using the page allocator, the other using
> CMA, and one v4l2 device using dma allocations.
>
> So we would have one KMS device and one heap using the page allocator,
> and one KMS device, one heap, and one v4l2 driver using the DMA
> allocator.
>
> Would these make different cgroup devices, or different cgroup regions?
Each driver would register a device, whatever feels most logical for that device I suppose.
My guess is that a prefix would also be nice here, so register a device with name of drm/$name or v4l2/$name, heap/$name. I didn't give it much thought and we're still experimenting, so just try something. :)
There's no limit to amount of devices, I only fixed amount of pools to match TTM, but even that could be increased arbitrarily. I just don't think there is a point in doing so.
>> The nice thing is that it should be similar to the memory cgroup controller
>> in semantics, so you would have the same memory behavior whether you use the
>> device cgroup or memory cgroup.
>>
>> I'm sad I missed the discussion, but hopefully we can coordinate more now
>> that we know we're both working on it. :)
>
> Yeah, definitely :)
>
> Maxime
Cheers,
~Maarten
Powered by blists - more mailing lists