[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e312099-bf47-831a-5d0e-3e95053cdb3f@redhat.com>
Date: Wed, 31 Mar 2021 08:41:00 +0200
From: David Hildenbrand <david@...hat.com>
To: Alistair Popple <apopple@...dia.com>
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
daniel.vetter@...ll.ch, dan.j.williams@...el.com,
gregkh@...uxfoundation.org, jhubbard@...dia.com,
jglisse@...hat.com, linux-mm@...ck.org
Subject: Re: [PATCH v2] kernel/resource: Fix locking in
request_free_mem_region
On 31.03.21 08:19, Alistair Popple wrote:
> On Tuesday, 30 March 2021 8:13:32 PM AEDT David Hildenbrand wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 29.03.21 03:37, Alistair Popple wrote:
>>> On Friday, 26 March 2021 7:57:51 PM AEDT David Hildenbrand wrote:
>>>> On 26.03.21 02:20, Alistair Popple wrote:
>>>>> request_free_mem_region() is used to find an empty range of physical
>>>>> addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
>>>>> over the range of possible addresses using region_intersects() to see if
>>>>> the range is free.
>>>>
>>>> Just a high-level question: how does this iteract with memory
>>>> hot(un)plug? IOW, how defines and manages the "range of possible
>>>> addresses" ?
>>>
>>> Both the driver and the maximum physical address bits available define the
>>> range of possible addresses for device private memory. From
>>> __request_free_mem_region():
>>>
>>> end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
>>> addr = end - size + 1UL;
>>>
>>> There is no lower address range bound here so it is effectively zero. The
> code
>>> will try to allocate the highest possible physical address first and
> continue
>>> searching down for a free block. Does that answer your question?
>>
>> Oh, sorry, the fist time I had a look I got it wrong - I thought (1UL <<
>> MAX_PHYSMEM_BITS) would be the lower address limit. That looks indeed
>> problematic to me.
>>
>> You might end up reserving an iomem region that could be used e.g., by
>> memory hotplug code later. If someone plugs a DIMM or adds memory via
>> different approaches (virtio-mem), memory hotplug (via add_memory())
>> would fail.
>>
>> You never should be touching physical memory area reserved for memory
>> hotplug, i.e., via SRAT.
>>
>> What is the expectation here?
>
> Most drivers call request_free_mem_region() with iomem_resource as the base.
> So zone device private pages currently tend to get allocated from the top of
> that.
Okay, but you could still "steal" iomem space that does not belong to
you, and the firmware will be unaware of that (e.g., it might hotplug a
DIMM in these spots). This is really nasty (although I guess as you
allocate top down, it will happen rarely).
>
> By definition ZONE_DEVICE private pages are unaddressable from the CPU. So in
> terms of expectation I think all that is really required for ZONE_DEVICE
> private pages (at least for Nouveau) is a valid range of physical addresses
> that allow page_to_pfn() and pfn_to_page() to work correctly. To make this
> work drivers add the pages via memremap_pages() -> pagemap_range() ->
> add_pages().
So you'd actually want some region above the hotpluggable/addressable
range -- e.g., above MAX_PHYSMEM_BITS.
The maximum number of sections we can have is define by
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
You'd e.g., want an extra space like (to be improved)
#define DEVMEM_BITS 1
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS + DEVMEM_BITS - SECTION_SIZE_BITS)
And do the search only within that range.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists