lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86cyltjqwy.fsf@scott-ph-mail.amperecomputing.com>
Date: Tue, 27 Aug 2024 19:57:01 -0700
From: D Scott Phillips <scott@...amperecomputing.com>
To: Dan Williams <dan.j.williams@...el.com>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, Andy Shevchenko
 <andriy.shevchenko@...ux.intel.com>, AKASHI Takahiro
 <takahiro.akashi@...aro.org>, Alison Schofield
 <alison.schofield@...el.com>, Dan Williams <dan.j.williams@...el.com>,
 Baoquan He <bhe@...hat.com>, Catalin Marinas <catalin.marinas@....com>,
 Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org
Cc: linux-arm-kernel@...ts.infradead.org, Andrew Morton
 <akpm@...ux-foundation.org>, "Kirill A. Shutemov"
 <kirill.shutemov@...ux.intel.com>, patches@...erecomputing.com, Felix
 Kuehling <Felix.Kuehling@....com>, amd-gfx@...ts.freedesktop.org
Subject: Re: [PATCH v2] resource: limit request_free_mem_region based on
 arch_get_mappable_range

Dan Williams <dan.j.williams@...el.com> writes:

> D Scott Phillips wrote:
>> On arm64 prior to commit 32697ff38287 ("arm64: vmemmap: Avoid base2 order
>> of struct page size to dimension region"), the amdgpu driver could trip
>> over the warning of:
>> 
>> `WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));`
>> 
>> in vmemmap_populate()[1]. After that commit, it becomes a translation fault
>> and panic[2].
>> 
>> The cause is that the amdgpu driver allocates some unused space from
>> iomem_resource and claims it as MEMORY_DEVICE_PRIVATE and
>> devm_memremap_pages() it. An address above those backed by the arm64
>> vmemmap is picked.
>> 
>> Limit request_free_mem_region() so that only addresses within the
>> arch_get_mappable_range() can be chosen as device private addresses.
>
> It seems odd that devm_request_free_mem_region() needs to be careful
> about this restriction. The caller passes in the resource tree that is
> the bounds of valid address ranges. This change assumes that the caller
> wants to be restricted to vmemmap capable address ranges beyond the
> restrictions it already requested in the passed in @base argument. That
> restriction may be true with respect to request_mem_region(), but not
> necessarily other users of get_free_mem_region() like
> alloc_free_mem_region().
>
> So, 2 questions / change request options:
>
> 1/ Preferred: Is there a possibility for the AMD driver to trim the
> resource it is passing to be bound by arch_get_mappable_range()? For CXL
> this is achieved by inserting CXL aperture windows into the resource
> tree.
>
> In the future what happens in the MEMORY_DEVICE_PUBLIC case when the
> memory address is picked by a hardware aperture on the device? It occurs
> to me if that aperture is communicated to the device via some platform
> mechanism (to honor arch_get_mappable_range() restrictions), then maybe
> the same should be done here.
>
> I have always cringed at the request_free_mem_region() implementation
> playing fast and loose with the platform memory map. Maybe this episode
> is a sign that these constraints need more formal handling in the
> resource tree.
>
> I.e. IORES_DESC_DEVICE_PRIVATE_MEMORY becomes a platform communicated
> aperture rather than hoping that unused portions of iomem_resource can
> be repurposed like this.

Hi Dan, sorry for my incredibly delayed response, I lost your message to
a filter on my end :(

I'm happy to work toward your preferred approach here, though I'm not
sure I know how to achieve it. I think I understand how cxl is keeping
device_private_memory out, but I don't think I understand the resource
system well enough to see how amdgpu can make a properly trimmed
resource for request_free_mem_region. My novice attempt would be
something like:

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 8ee3d07ffbdfa..d84de6d66ac45 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -1038,7 +1039,14 @@ int kgd2kfd_init_zone_device(struct amdgpu_device *adev)
                pgmap->range.end = adev->gmc.aper_base + adev->gmc.aper_size - 1;
                pgmap->type = MEMORY_DEVICE_COHERENT;
        } else {
-               res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
+               struct range mappable;
+               struct resource root;
+
+               mappable = arch_get_mappable_range();
+               root.start = mappable.start;
+               root.end = mappable.end;
+               root.child = iomem_resource.child;
+               res = devm_request_free_mem_region(adev->dev, &root, size);
                if (IS_ERR(res))
                        return PTR_ERR(res);
                pgmap->range.start = res->start;

Apart from this being wrong with respect to resource_lock, is that sort
of the idea? or am I missing the sensible way to hoist the vmemmap range
into iomem_resource? or maybe I'm just totally off in the weeds.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ