[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6a9e3cc-9dca-946a-c3fc-f86753fe8fd4@amd.com>
Date: Wed, 1 Sep 2021 19:07:34 -0400
From: Felix Kuehling <felix.kuehling@....com>
To: Dave Chinner <david@...morbit.com>
Cc: Christoph Hellwig <hch@....de>,
"Sierra Guiza, Alejandro (Alex)" <alex.sierra@....com>,
akpm@...ux-foundation.org, linux-mm@...ck.org,
rcampbell@...dia.com, linux-ext4@...r.kernel.org,
linux-xfs@...r.kernel.org, amd-gfx@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, jgg@...dia.com, jglisse@...hat.com
Subject: Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration
On 2021-09-01 6:03 p.m., Dave Chinner wrote:
> On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote:
>> Am 2021-09-01 um 4:29 a.m. schrieb Christoph Hellwig:
>>> On Mon, Aug 30, 2021 at 01:04:43PM -0400, Felix Kuehling wrote:
>>>>>> driver code is not really involved in updating the CPU mappings. Maybe
>>>>>> it's something we need to do in the migration helpers.
>>>>> It looks like I'm totally misunderstanding what you are adding here
>>>>> then. Why do we need any special treatment at all for memory that
>>>>> has normal struct pages and is part of the direct kernel map?
>>>> The pages are like normal memory for purposes of mapping them in CPU
>>>> page tables and for coherent access from the CPU.
>>> That's the user page tables. What about the kernel direct map?
>>> If there is a normal kernel struct page backing there really should
>>> be no need for the pgmap.
>> I'm not sure. The physical address ranges are in the UEFI system address
>> map as special-purpose memory. Does Linux create the struct pages and
>> kernel direct map for that without a pgmap call? I didn't see that last
>> time I went digging through that code.
>>
>>
>>>> From an application
>>>> perspective, we want file-backed and anonymous mappings to be able to
>>>> use DEVICE_PUBLIC pages with coherent CPU access. The goal is to
>>>> optimize performance for GPU heavy workloads while minimizing the need
>>>> to migrate data back-and-forth between system memory and device memory.
>>> I don't really understand that part. file backed pages are always
>>> allocated by the file system using the pagecache helpers, that is
>>> using the page allocator. Anonymouns memory also always comes from
>>> the page allocator.
>> I'm coming at this from my experience with DEVICE_PRIVATE. Both
>> anonymous and file-backed pages should be migrateable to DEVICE_PRIVATE
>> memory by the migrate_vma_* helpers for more efficient access by our
>> GPU. (*) It's part of the basic premise of HMM as I understand it. I
>> would expect the same thing to work for DEVICE_PUBLIC memory.
>>
>> (*) I believe migrating file-backed pages to DEVICE_PRIVATE doesn't
>> currently work, but that's something I'm hoping to fix at some point.
> FWIW, I'd love to see the architecture documents that define how
> filesystems are supposed to interact with this device private
> memory. This whole "hand filesystem controlled memory to other
> devices" is a minefield that is trivial to get wrong iand very
> difficult to fix - just look at the historical mess that RDMA
> to/from file backed and/or DAX pages has been.
>
> So, really, from my perspective as a filesystem engineer, I want to
> see an actual specification for how this new memory type is going to
> interact with filesystem and the page cache so everyone has some
> idea of how this is going to work and can point out how it doesn't
> work before code that simply doesn't work is pushed out into
> production systems and then merged....
OK. To be clear, that's not part of this patch series. And I have no
authority to push anything in this part of the kernel, so you have
nothing to fear. ;)
FWIW, we already have the ability to map file-backed system memory pages
into device page tables with HMM and interval notifiers. But we cannot
currently migrate them to ZONE_DEVICE pages. Beyond that, my
understanding of how filesystems and page cache work is rather
superficial at this point. I'll keep your name in mind for when I am
ready to discuss this in more detail.
Cheers,
Felix
>
> Cheers,
>
> Dave.
Powered by blists - more mailing lists