[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cffdd191-2fdd-4b5d-abf2-4cf77b96b681@amd.com>
Date: Mon, 9 Feb 2026 23:46:46 +0800
From: Honglei Huang <honghuan@....com>
To: Christian König <christian.koenig@....com>
Cc: Felix.Kuehling@....com, Philip.Yang@....com, Ray.Huang@....com,
alexander.deucher@....com, dmitry.osipenko@...labora.com,
Xinhui.Pan@....com, airlied@...il.com, daniel@...ll.ch,
amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, akpm@...ux-foundation.org
Subject: Re: [PATCH v3 0/8] drm/amdkfd: Add batch userptr allocation support
Agreed with you that with many ranges, the probability of
cross-invalidation during sequential hmm_range_fault() calls
increases, and in a extreme scenario this could lead to excessive
retries. I had been focused on proving correctness and missed the
scalability.
I propose the further plan:
Will add a retry limit similar to what DRM GPU SVM does with
DRM_GPUSVM_MAX_RETRIES. This bounds the worst case.
This maybe ok to make the current batch userptr usable.
And I agree that teaching walk_page_range() to handle
non-contiguous VA sets in a single walk would be the proper
long-term solution. That work would benefit not only KFD batch
userptr. Will keep digging out the better solution.
Regards,
Honglei
On 2026/2/9 23:07, Christian König wrote:
> On 2/9/26 15:44, Honglei Huang wrote:
>> you said that DRM GPU SVM has the same pattern, but argued
>> that it is not designed for "batch userptr". However, this distinction
>> has no technical significance. The core problem is "multiple ranges
>> under one wide notifier doing per-range hmm_range_fault". Whether
>> these ranges are dynamically created by GPU page faults or
>> batch-specified via ioctl, the concurrency safety mechanism is
>> same.
>>
>> You said "each hmm_range_fault() can invalidate the other ranges
>> while faulting them in". Yes, this can happen but this is precisely
>> the scenario that mem->invalid catches:
>>
>> 1. hmm_range_fault(A) succeeds
>> 2. hmm_range_fault(B) triggers reclaim → A's pages swapped out
>> → MMU notifier callback:
>> mutex_lock(notifier_lock)
>> range_A->valid = false
>> mem->invalid++
>> mutex_unlock(notifier_lock)
>> 3. hmm_range_fault(B) completes
>> 4. Commit phase:
>> mutex_lock(notifier_lock)
>> mem->invalid != saved_invalid
>> → return -EAGAIN, retry entire batch
>> mutex_unlock(notifier_lock)
>>
>> invalid pages are never committed.
>
> Once more that is not the problem. I completely agree that this is all correctly handled.
>
> The problem is that the more hmm_ranges you get the more likely it is that getting another pfn invalidates a pfn you previously acquired.
>
> So this can end up in an endless loop, and that's why the GPUSVM code also has a timeout on the retry.
>
>
> What you need to figure out is how to teach hmm_range_fault() and the underlying walk_page_range() how to skip entries which you are not interested in.
>
> Just a trivial example, assuming you have the following VAs you want your userptr to be filled in with: 3, 1, 5, 8, 7, 2
>
> To handle this case you need to build a data structure which tells you what is the smalest, largest and where each VA in the middle comes in. So you need something like: 1->1, 2->5, 3->0, 5->2, 7->4, 8->3
>
> Then you would call walk_page_range(mm, 1, 8, ops, data), the pud walk decides if it needs to go into pmd or eventually fault, the pmd walk decides if ptes needs to be filled in etc...
>
> The final pte handler then fills in the pfns linearly for the addresses you need.
>
> And yeah I perfectly know that this is horrible complicated, but as far as I can see everything else will just not scale.
>
> Creating hundreds of separate userptrs only scales up to a few megabyte and then falls apart.
>
> Regards,
> Christian.
>
>>
>> Regards,
>> Honglei
>>
>>
>> On 2026/2/9 22:25, Christian König wrote:
>>> On 2/9/26 15:16, Honglei Huang wrote:
>>>> The case you described: one hmm_range_fault() invalidating another's
>>>> seq under the same notifier, is already handled in the implementation.
>>>>
>>>> example: suppose ranges A, B, C share one notifier:
>>>>
>>>> 1. hmm_range_fault(A) succeeds, seq_A recorded
>>>> 2. External invalidation occurs, triggers callback:
>>>> mutex_lock(notifier_lock)
>>>> → mmu_interval_set_seq()
>>>> → range_A->valid = false
>>>> → mem->invalid++
>>>> mutex_unlock(notifier_lock)
>>>> 3. hmm_range_fault(B) succeeds
>>>> 4. Commit phase:
>>>> mutex_lock(notifier_lock)
>>>> → check mem->invalid != saved_invalid
>>>> → return -EAGAIN, retry the entire batch
>>>> mutex_unlock(notifier_lock)
>>>>
>>>> All concurrent invalidations are caught by the mem->invalid counter.
>>>> Additionally, amdgpu_ttm_tt_get_user_pages_done() in confirm_valid_user_pages_locked
>>>> performs a per-range mmu_interval_read_retry() as a final safety check.
>>>>
>>>> DRM GPU SVM uses the same approach: drm_gpusvm_get_pages() also calls
>>>> hmm_range_fault() per-range independently there is no array version
>>>> of hmm_range_fault in DRM GPU SVM either. If you consider this approach
>>>> unworkable, then DRM GPU SVM would be unworkable too, yet it has been
>>>> accepted upstream.
>>>>
>>>> The number of batch ranges is controllable. And even if it
>>>> scales to thousands, DRM GPU SVM faces exactly the same situation:
>>>> it does not need an array version of hmm_range_fault either, which
>>>> shows this is a correctness question, not a performance one. For
>>>> correctness, I believe DRM GPU SVM already demonstrates the approach
>>>> is ok.
>>>
>>> Well yes, GPU SVM would have exactly the same problems. But that also doesn't have a create bulk userptr interface.
>>>
>>> The implementation is simply not made for this use case, and as far as I know no current upstream implementation is.
>>>
>>>> For performance, I have tested with thousands of ranges present:
>>>> performance reaches 80%~95% of the native driver, and all OpenCL
>>>> and ROCr test suites pass with no correctness issues.
>>>
>>> Testing can only falsify a system and not verify it.
>>>
>>>> Here is how DRM GPU SVM handles correctness with multiple ranges
>>>> under one wide notifier doing per-range hmm_range_fault:
>>>>
>>>> Invalidation: drm_gpusvm_notifier_invalidate()
>>>> - Acquires notifier_lock
>>>> - Calls mmu_interval_set_seq()
>>>> - Iterates affected ranges via driver callback (xe_svm_invalidate)
>>>> - Clears has_dma_mapping = false for each affected range (under lock)
>>>> - Releases notifier_lock
>>>>
>>>> Fault: drm_gpusvm_get_pages() (called per-range independently)
>>>> - mmu_interval_read_begin() to get seq
>>>> - hmm_range_fault() outside lock
>>>> - Acquires notifier_lock
>>>> - mmu_interval_read_retry() → if stale, release lock and retry
>>>> - DMA map pages + set has_dma_mapping = true (under lock)
>>>> - Releases notifier_lock
>>>>
>>>> Validation: drm_gpusvm_pages_valid()
>>>> - Checks has_dma_mapping flag (under lock), NOT seq
>>>>
>>>> If invalidation occurs between two per-range faults, the flag is
>>>> cleared under lock, and either mmu_interval_read_retry catches it
>>>> in the current fault, or drm_gpusvm_pages_valid() catches it at
>>>> validation time. No stale pages are ever committed.
>>>>
>>>> KFD batch userptr uses the same three-step pattern:
>>>>
>>>> Invalidation: amdgpu_amdkfd_evict_userptr_batch()
>>>> - Acquires notifier_lock
>>>> - Calls mmu_interval_set_seq()
>>>> - Iterates affected ranges via interval_tree
>>>> - Sets range->valid = false for each affected range (under lock)
>>>> - Increments mem->invalid (under lock)
>>>> - Releases notifier_lock
>>>>
>>>> Fault: update_invalid_user_pages()
>>>> - Per-range hmm_range_fault() outside lock
>>>
>>> And here the idea falls apart. Each hmm_range_fault() can invalidate the other ranges while faulting them in.
>>>
>>> That is not fundamentally solveable, but by moving the handling further into hmm_range_fault it makes it much less likely that something goes wrong.
>>>
>>> So once more as long as this still uses this hacky approach I will clearly reject this implementation.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> - Acquires notifier_lock
>>>> - Checks mem->invalid != saved_invalid → if changed, -EAGAIN retry
>>>> - Sets range->valid = true for faulted ranges (under lock)
>>>> - Releases notifier_lock
>>>>
>>>> Validation: valid_user_pages_batch()
>>>> - Checks range->valid flag
>>>> - Calls amdgpu_ttm_tt_get_user_pages_done() (mmu_interval_read_retry)
>>>>
>>>> The logic is equivalent as far as I can see.
>>>>
>>>> Regards,
>>>> Honglei
>>>>
>>>>
>>>>
>>>> On 2026/2/9 21:27, Christian König wrote:
>>>>> On 2/9/26 14:11, Honglei Huang wrote:
>>>>>>
>>>>>> So the drm svm is also a NAK?
>>>>>>
>>>>>> These codes have passed local testing, opencl and rocr, I also provided a detailed code path and analysis.
>>>>>> You only said the conclusion without providing any reasons or evidence. Your statement has no justifiable reasons and is difficult to convince
>>>>>> so far.
>>>>>
>>>>> That sounds like you don't understand what the issue here is, I will try to explain this once more on pseudo-code.
>>>>>
>>>>> Page tables are updated without holding a lock, so when you want to grab physical addresses from the then you need to use an opportunistically retry based approach to make sure that the data you got is still valid.
>>>>>
>>>>> In other words something like this here is needed:
>>>>>
>>>>> retry:
>>>>> hmm_range.notifier_seq = mmu_interval_read_begin(notifier);
>>>>> hmm_range.hmm_pfns = kvmalloc_array(npages, ...);
>>>>> ...
>>>>> while (true) {
>>>>> mmap_read_lock(mm);
>>>>> err = hmm_range_fault(&hmm_range);
>>>>> mmap_read_unlock(mm);
>>>>>
>>>>> if (err == -EBUSY) {
>>>>> if (time_after(jiffies, timeout))
>>>>> break;
>>>>>
>>>>> hmm_range.notifier_seq =
>>>>> mmu_interval_read_begin(notifier);
>>>>> continue;
>>>>> }
>>>>> break;
>>>>> }
>>>>> ...
>>>>> for (i = 0, j = 0; i < npages; ++j) {
>>>>> ...
>>>>> dma_map_page(...)
>>>>> ...
>>>>> grab_notifier_lock();
>>>>> if (mmu_interval_read_retry(notifier, hmm_range.notifier_seq))
>>>>> goto retry;
>>>>> restart_queues();
>>>>> drop_notifier_lock();
>>>>> ...
>>>>>
>>>>> Now hmm_range.notifier_seq indicates if your DMA addresses are still valid or not after you grabbed the notifier lock.
>>>>>
>>>>> The problem is that hmm_range works only on a single range/sequence combination, so when you do multiple calls to hmm_range_fault() for scattered VA is can easily be that one call invalidates the ranges of another call.
>>>>>
>>>>> So as long as you only have a few hundred hmm_ranges for your userptrs that kind of works, but it doesn't scale up into the thousands of different VA addresses you get for scattered handling.
>>>>>
>>>>> That's why hmm_range_fault needs to be modified to handle an array of VA addresses instead of just a A..B range.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>
>>>>>>
>>>>>> On 2026/2/9 20:59, Christian König wrote:
>>>>>>> On 2/9/26 13:52, Honglei Huang wrote:
>>>>>>>> DRM GPU SVM does use hmm_range_fault(), see drm_gpusvm_get_pages()
>>>>>>>
>>>>>>> I'm not sure what you are talking about, drm_gpusvm_get_pages() only supports a single range as well and not scatter gather of VA addresses.
>>>>>>>
>>>>>>> As far as I can see that doesn't help the slightest.
>>>>>>>
>>>>>>>> My implementation follows the same pattern. The detailed comparison
>>>>>>>> of invalidation path was provided in the second half of my previous mail.
>>>>>>>
>>>>>>> Yeah and as I said that is not very valuable because it doesn't solves the sequence problem.
>>>>>>>
>>>>>>> As far as I can see the approach you try here is a clear NAK from my side.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> On 2026/2/9 18:16, Christian König wrote:
>>>>>>>>> On 2/9/26 07:14, Honglei Huang wrote:
>>>>>>>>>>
>>>>>>>>>> I've reworked the implementation in v4. The fix is actually inspired
>>>>>>>>>> by the DRM GPU SVM framework (drivers/gpu/drm/drm_gpusvm.c).
>>>>>>>>>>
>>>>>>>>>> DRM GPU SVM uses wide notifiers (recommended 512M or larger) to track
>>>>>>>>>> multiple user virtual address ranges under a single mmu_interval_notifier,
>>>>>>>>>> and these ranges can be non-contiguous which is essentially the same
>>>>>>>>>> problem that batch userptr needs to solve: one BO backed by multiple
>>>>>>>>>> non-contiguous CPU VA ranges sharing one notifier.
>>>>>>>>>
>>>>>>>>> That still doesn't solve the sequencing problem.
>>>>>>>>>
>>>>>>>>> As far as I can see you can't use hmm_range_fault with this approach or it would just not be very valuable.
>>>>>>>>>
>>>>>>>>> So how should that work with your patch set?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The wide notifier is created in drm_gpusvm_notifier_alloc:
>>>>>>>>>> notifier->itree.start = ALIGN_DOWN(fault_addr, gpusvm->notifier_size);
>>>>>>>>>> notifier->itree.last = ALIGN(fault_addr + 1, gpusvm->notifier_size) - 1;
>>>>>>>>>> The Xe driver passes
>>>>>>>>>> xe_modparam.svm_notifier_size * SZ_1M in xe_svm_init
>>>>>>>>>> as the notifier_size, so one notifier can cover many of MB of VA space
>>>>>>>>>> containing multiple non-contiguous ranges.
>>>>>>>>>>
>>>>>>>>>> And DRM GPU SVM solves the per-range validity problem with flag-based
>>>>>>>>>> validation instead of seq-based validation in:
>>>>>>>>>> - drm_gpusvm_pages_valid() checks
>>>>>>>>>> flags.has_dma_mapping
>>>>>>>>>> not notifier_seq. The comment explicitly states:
>>>>>>>>>> "This is akin to a notifier seqno check in the HMM documentation
>>>>>>>>>> but due to wider notifiers (i.e., notifiers which span multiple
>>>>>>>>>> ranges) this function is required for finer grained checking"
>>>>>>>>>> - __drm_gpusvm_unmap_pages() clears
>>>>>>>>>> flags.has_dma_mapping = false under notifier_lock
>>>>>>>>>> - drm_gpusvm_get_pages() sets
>>>>>>>>>> flags.has_dma_mapping = true under notifier_lock
>>>>>>>>>> I adopted the same approach.
>>>>>>>>>>
>>>>>>>>>> DRM GPU SVM:
>>>>>>>>>> drm_gpusvm_notifier_invalidate()
>>>>>>>>>> down_write(&gpusvm->notifier_lock);
>>>>>>>>>> mmu_interval_set_seq(mni, cur_seq);
>>>>>>>>>> gpusvm->ops->invalidate()
>>>>>>>>>> -> xe_svm_invalidate()
>>>>>>>>>> drm_gpusvm_for_each_range()
>>>>>>>>>> -> __drm_gpusvm_unmap_pages()
>>>>>>>>>> WRITE_ONCE(flags.has_dma_mapping = false); // clear flag
>>>>>>>>>> up_write(&gpusvm->notifier_lock);
>>>>>>>>>>
>>>>>>>>>> KFD batch userptr:
>>>>>>>>>> amdgpu_amdkfd_evict_userptr_batch()
>>>>>>>>>> mutex_lock(&process_info->notifier_lock);
>>>>>>>>>> mmu_interval_set_seq(mni, cur_seq);
>>>>>>>>>> discard_invalid_ranges()
>>>>>>>>>> interval_tree_iter_first/next()
>>>>>>>>>> range_info->valid = false; // clear flag
>>>>>>>>>> mutex_unlock(&process_info->notifier_lock);
>>>>>>>>>>
>>>>>>>>>> Both implementations:
>>>>>>>>>> - Acquire notifier_lock FIRST, before any flag changes
>>>>>>>>>> - Call mmu_interval_set_seq() under the lock
>>>>>>>>>> - Use interval tree to find affected ranges within the wide notifier
>>>>>>>>>> - Mark per-range flag as invalid/valid under the lock
>>>>>>>>>>
>>>>>>>>>> The page fault path and final validation path also follow the same
>>>>>>>>>> pattern as DRM GPU SVM: fault outside the lock, set/check per-range
>>>>>>>>>> flag under the lock.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Honglei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2026/2/6 21:56, Christian König wrote:
>>>>>>>>>>> On 2/6/26 07:25, Honglei Huang wrote:
>>>>>>>>>>>> From: Honglei Huang <honghuan@....com>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> This is v3 of the patch series to support allocating multiple non-contiguous
>>>>>>>>>>>> CPU virtual address ranges that map to a single contiguous GPU virtual address.
>>>>>>>>>>>>
>>>>>>>>>>>> v3:
>>>>>>>>>>>> 1. No new ioctl: Reuses existing AMDKFD_IOC_ALLOC_MEMORY_OF_GPU
>>>>>>>>>>>> - Adds only one flag: KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH
>>>>>>>>>>>
>>>>>>>>>>> That is most likely not the best approach, but Felix or Philip need to comment here since I don't know such IOCTLs well either.
>>>>>>>>>>>
>>>>>>>>>>>> - When flag is set, mmap_offset field points to range array
>>>>>>>>>>>> - Minimal API surface change
>>>>>>>>>>>
>>>>>>>>>>> Why range of VA space for each entry?
>>>>>>>>>>>
>>>>>>>>>>>> 2. Improved MMU notifier handling:
>>>>>>>>>>>> - Single mmu_interval_notifier covering the VA span [va_min, va_max]
>>>>>>>>>>>> - Interval tree for efficient lookup of affected ranges during invalidation
>>>>>>>>>>>> - Avoids per-range notifier overhead mentioned in v2 review
>>>>>>>>>>>
>>>>>>>>>>> That won't work unless you also modify hmm_range_fault() to take multiple VA addresses (or ranges) at the same time.
>>>>>>>>>>>
>>>>>>>>>>> The problem is that we must rely on hmm_range.notifier_seq to detect changes to the page tables in question, but that in turn works only if you have one hmm_range structure and not multiple.
>>>>>>>>>>>
>>>>>>>>>>> What might work is doing an XOR or CRC over all hmm_range.notifier_seq you have, but that is a bit flaky.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 3. Better code organization: Split into 8 focused patches for easier review
>>>>>>>>>>>>
>>>>>>>>>>>> v2:
>>>>>>>>>>>> - Each CPU VA range gets its own mmu_interval_notifier for invalidation
>>>>>>>>>>>> - All ranges validated together and mapped to contiguous GPU VA
>>>>>>>>>>>> - Single kgd_mem object with array of user_range_info structures
>>>>>>>>>>>> - Unified eviction/restore path for all ranges in a batch
>>>>>>>>>>>>
>>>>>>>>>>>> Current Implementation Approach
>>>>>>>>>>>> ===============================
>>>>>>>>>>>>
>>>>>>>>>>>> This series implements a practical solution within existing kernel constraints:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Single MMU notifier for VA span: Register one notifier covering the
>>>>>>>>>>>> entire range from lowest to highest address in the batch
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Interval tree filtering: Use interval tree to efficiently identify
>>>>>>>>>>>> which specific ranges are affected during invalidation callbacks,
>>>>>>>>>>>> avoiding unnecessary processing for unrelated address changes
>>>>>>>>>>>>
>>>>>>>>>>>> 3. Unified eviction/restore: All ranges in a batch share eviction and
>>>>>>>>>>>> restore paths, maintaining consistency with existing userptr handling
>>>>>>>>>>>>
>>>>>>>>>>>> Patch Series Overview
>>>>>>>>>>>> =====================
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 1/8: Add userptr batch allocation UAPI structures
>>>>>>>>>>>> - KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH flag
>>>>>>>>>>>> - kfd_ioctl_userptr_range and kfd_ioctl_userptr_ranges_data structures
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 2/8: Add user_range_info infrastructure to kgd_mem
>>>>>>>>>>>> - user_range_info structure for per-range tracking
>>>>>>>>>>>> - Fields for batch allocation in kgd_mem
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 3/8: Implement interval tree for userptr ranges
>>>>>>>>>>>> - Interval tree for efficient range lookup during invalidation
>>>>>>>>>>>> - mark_invalid_ranges() function
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 4/8: Add batch MMU notifier support
>>>>>>>>>>>> - Single notifier for entire VA span
>>>>>>>>>>>> - Invalidation callback using interval tree filtering
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 5/8: Implement batch userptr page management
>>>>>>>>>>>> - get_user_pages_batch() and set_user_pages_batch()
>>>>>>>>>>>> - Per-range page array management
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 6/8: Add batch allocation function and export API
>>>>>>>>>>>> - init_user_pages_batch() main initialization
>>>>>>>>>>>> - amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch() entry point
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 7/8: Unify userptr cleanup and update paths
>>>>>>>>>>>> - Shared eviction/restore handling for batch allocations
>>>>>>>>>>>> - Integration with existing userptr validation flows
>>>>>>>>>>>>
>>>>>>>>>>>> Patch 8/8: Wire up batch allocation in ioctl handler
>>>>>>>>>>>> - Input validation and range array parsing
>>>>>>>>>>>> - Integration with existing alloc_memory_of_gpu path
>>>>>>>>>>>>
>>>>>>>>>>>> Testing
>>>>>>>>>>>> =======
>>>>>>>>>>>>
>>>>>>>>>>>> - Multiple scattered malloc() allocations (2-4000+ ranges)
>>>>>>>>>>>> - Various allocation sizes (4KB to 1G+ per range)
>>>>>>>>>>>> - Memory pressure scenarios and eviction/restore cycles
>>>>>>>>>>>> - OpenCL CTS and HIP catch tests in KVM guest environment
>>>>>>>>>>>> - AI workloads: Stable Diffusion, ComfyUI in virtualized environments
>>>>>>>>>>>> - Small LLM inference (3B-7B models)
>>>>>>>>>>>> - Benchmark score: 160,000 - 190,000 (80%-95% of bare metal)
>>>>>>>>>>>> - Performance improvement: 2x-2.4x faster than userspace approach
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for your review and feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Honglei Huang
>>>>>>>>>>>>
>>>>>>>>>>>> Honglei Huang (8):
>>>>>>>>>>>> drm/amdkfd: Add userptr batch allocation UAPI structures
>>>>>>>>>>>> drm/amdkfd: Add user_range_info infrastructure to kgd_mem
>>>>>>>>>>>> drm/amdkfd: Implement interval tree for userptr ranges
>>>>>>>>>>>> drm/amdkfd: Add batch MMU notifier support
>>>>>>>>>>>> drm/amdkfd: Implement batch userptr page management
>>>>>>>>>>>> drm/amdkfd: Add batch allocation function and export API
>>>>>>>>>>>> drm/amdkfd: Unify userptr cleanup and update paths
>>>>>>>>>>>> drm/amdkfd: Wire up batch allocation in ioctl handler
>>>>>>>>>>>>
>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 23 +
>>>>>>>>>>>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 539 +++++++++++++++++-
>>>>>>>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 128 ++++-
>>>>>>>>>>>> include/uapi/linux/kfd_ioctl.h | 31 +-
>>>>>>>>>>>> 4 files changed, 697 insertions(+), 24 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Powered by blists - more mailing lists