[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a26a71cb-101b-e7a2-9a2f-78995538dbca@oracle.com>
Date: Fri, 14 Sep 2018 11:04:54 -0700
From: Prakash Sangappa <prakash.sangappa@...cle.com>
To: Steven Sistare <steven.sistare@...cle.com>,
Michal Hocko <mhocko@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
dave.hansen@...el.com, nao.horiguchi@...il.com,
akpm@...ux-foundation.org, kirill.shutemov@...ux.intel.com,
khandual@...ux.vnet.ibm.com
Subject: Re: [PATCH V2 0/6] VA to numa node information
On 9/14/18 9:01 AM, Steven Sistare wrote:
> On 9/14/2018 1:56 AM, Michal Hocko wrote:
>> On Thu 13-09-18 15:32:25, prakash.sangappa wrote:
>>>
>>> The proc interface provides an efficient way to export address range
>>> to numa node id mapping information compared to using the API.
>> Do you have any numbers?
>>
>>> For example, for sparsely populated mappings, if a VMA has large portions
>>> not have any physical pages mapped, the page walk done thru the /proc file
>>> interface can skip over non existent PMDs / ptes. Whereas using the
>>> API the application would have to scan the entire VMA in page size units.
>> What prevents you from pre-filtering by reading /proc/$pid/maps to get
>> ranges of interest?
> That works for skipping holes, but not for skipping huge pages. I did a
> quick experiment to time move_pages on a 3 GHz Xeon and a 4.18 kernel.
> Allocate 128 GB and touch every small page. Call move_pages with nodes=NULL
> to get the node id for all pages, passing 512 consecutive small pages per
> call to move_nodes. The total move_nodes time is 1.85 secs, and 55 nsec
> per page. Extrapolating to a 1 TB range, it would take 15 sec to retrieve
> the numa node for every small page in the range. That is not terrible, but
> it is not interactive, and it becomes terrible for multiple TB.
>
Also, for valid VMAs in 'maps' file, if the VMA is sparsely populated
with physical pages,
the page walk can skip over non existing page table entires (PMDs) and
so can be faster.
For example reading va range of a 400GB VMA which has few pages mapped
in beginning and few pages at the end and the rest of VMA does not have
any pages, it
takes 0.001s using the /proc interface. Whereas with move_page() api
passing 1024
consecutive small pages address, it takes about 2.4secs. This is on a
similar system
running 4.19 kernel.
Powered by blists - more mailing lists