linux-kernel - Re: [RFC] mm: introduce page pinner

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Date:   Sat, 11 Dec 2021 23:21:33 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     胡玮文 <huww98@...look.com>,
        Minchan Kim <minchan@...nel.org>
Cc:     胡玮文 <sehuww@...l.scut.edu.cn>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...e.com>,
        David Hildenbrand <david@...hat.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        John Dias <joaodias@...gle.com>
Subject: Re: [RFC] mm: introduce page pinner

On 12/10/21 01:54, 胡玮文 wrote:
...
>> So you are suspecting some kernel driver hold a addtional refcount
>> using get_user_pages or page get API?
> 
> Yes. By using the trace events in this patch, I have confirmed it is nvidia
> kernel module that holds the refcount. I got the stacktrace like this (from
> "perf script"):
> 
> cuda-EvtHandlr 31023 [000]  3244.976411:                   page_pinner:page_pinner_put: pfn=0x13e473 flags=0x8001e count=0 mapcount=0 mapping=(nil) mt=1
>          ffffffff82511be4 __page_pinner_put+0x54 (/lib/modules/5.15.6+/build/vmlinux)
>          ffffffff82511be4 __page_pinner_put+0x54 (/lib/modules/5.15.6+/build/vmlinux)
>          ffffffffc0b71e1f os_unlock_user_pages+0xbf ([nvidia])

The corresponding call to os_unlock_user_pages() is os_lock_user_pages(). And
os_lock_user_pages() does call get_user_pages().

This is part of normal operation for many CUDA (and OpenCL) programs: "host memory"
(host == CPU, device == GPU) is pinned, and GPU pages tables set up to point to it.

If your program calls cudaHostRegister() [1], then that will in turn work its way
down to os_lock_user_pages(), and if the program is still running on the GPU, then
it's reasonable for those pages to still be pinned. This is a very common pattern
for some programs, especially for those who have tuned their access patterns and
know that most accesses are from the CPU side, with possible rare access from the
GPU.

>          ffffffffc14a4546 _nv032165rm+0x96 ([nvidia])
> 
> Still not much information. NVIDIA does not want me to debug its module. Maybe
> the only thing I can do is reporting this to NVIDIA.
> 

...or hope that someone here, maybe even from NVIDIA, can help! :)

Let me know if there are further questions, and if they are outside of the linux-mm
area, we can take it up in an off-list email thread if necessary.

[1] cudaHostRegister():
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge8d5c17670f16ac4fc8fcb4181cb490c

thanks,
-- 
John Hubbard
NVIDIA