[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86cyltjqwy.fsf@scott-ph-mail.amperecomputing.com>
Date: Tue, 27 Aug 2024 19:57:01 -0700
From: D Scott Phillips <scott@...amperecomputing.com>
To: Dan Williams <dan.j.williams@...el.com>, Greg Kroah-Hartman
<gregkh@...uxfoundation.org>, Andy Shevchenko
<andriy.shevchenko@...ux.intel.com>, AKASHI Takahiro
<takahiro.akashi@...aro.org>, Alison Schofield
<alison.schofield@...el.com>, Dan Williams <dan.j.williams@...el.com>,
Baoquan He <bhe@...hat.com>, Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org
Cc: linux-arm-kernel@...ts.infradead.org, Andrew Morton
<akpm@...ux-foundation.org>, "Kirill A. Shutemov"
<kirill.shutemov@...ux.intel.com>, patches@...erecomputing.com, Felix
Kuehling <Felix.Kuehling@....com>, amd-gfx@...ts.freedesktop.org
Subject: Re: [PATCH v2] resource: limit request_free_mem_region based on
arch_get_mappable_range
Dan Williams <dan.j.williams@...el.com> writes:
> D Scott Phillips wrote:
>> On arm64 prior to commit 32697ff38287 ("arm64: vmemmap: Avoid base2 order
>> of struct page size to dimension region"), the amdgpu driver could trip
>> over the warning of:
>>
>> `WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));`
>>
>> in vmemmap_populate()[1]. After that commit, it becomes a translation fault
>> and panic[2].
>>
>> The cause is that the amdgpu driver allocates some unused space from
>> iomem_resource and claims it as MEMORY_DEVICE_PRIVATE and
>> devm_memremap_pages() it. An address above those backed by the arm64
>> vmemmap is picked.
>>
>> Limit request_free_mem_region() so that only addresses within the
>> arch_get_mappable_range() can be chosen as device private addresses.
>
> It seems odd that devm_request_free_mem_region() needs to be careful
> about this restriction. The caller passes in the resource tree that is
> the bounds of valid address ranges. This change assumes that the caller
> wants to be restricted to vmemmap capable address ranges beyond the
> restrictions it already requested in the passed in @base argument. That
> restriction may be true with respect to request_mem_region(), but not
> necessarily other users of get_free_mem_region() like
> alloc_free_mem_region().
>
> So, 2 questions / change request options:
>
> 1/ Preferred: Is there a possibility for the AMD driver to trim the
> resource it is passing to be bound by arch_get_mappable_range()? For CXL
> this is achieved by inserting CXL aperture windows into the resource
> tree.
>
> In the future what happens in the MEMORY_DEVICE_PUBLIC case when the
> memory address is picked by a hardware aperture on the device? It occurs
> to me if that aperture is communicated to the device via some platform
> mechanism (to honor arch_get_mappable_range() restrictions), then maybe
> the same should be done here.
>
> I have always cringed at the request_free_mem_region() implementation
> playing fast and loose with the platform memory map. Maybe this episode
> is a sign that these constraints need more formal handling in the
> resource tree.
>
> I.e. IORES_DESC_DEVICE_PRIVATE_MEMORY becomes a platform communicated
> aperture rather than hoping that unused portions of iomem_resource can
> be repurposed like this.
Hi Dan, sorry for my incredibly delayed response, I lost your message to
a filter on my end :(
I'm happy to work toward your preferred approach here, though I'm not
sure I know how to achieve it. I think I understand how cxl is keeping
device_private_memory out, but I don't think I understand the resource
system well enough to see how amdgpu can make a properly trimmed
resource for request_free_mem_region. My novice attempt would be
something like:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 8ee3d07ffbdfa..d84de6d66ac45 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -1038,7 +1039,14 @@ int kgd2kfd_init_zone_device(struct amdgpu_device *adev)
pgmap->range.end = adev->gmc.aper_base + adev->gmc.aper_size - 1;
pgmap->type = MEMORY_DEVICE_COHERENT;
} else {
- res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
+ struct range mappable;
+ struct resource root;
+
+ mappable = arch_get_mappable_range();
+ root.start = mappable.start;
+ root.end = mappable.end;
+ root.child = iomem_resource.child;
+ res = devm_request_free_mem_region(adev->dev, &root, size);
if (IS_ERR(res))
return PTR_ERR(res);
pgmap->range.start = res->start;
Apart from this being wrong with respect to resource_lock, is that sort
of the idea? or am I missing the sensible way to hoist the vmemmap range
into iomem_resource? or maybe I'm just totally off in the weeds.
Powered by blists - more mailing lists