[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <79d5e991-d9f6-65e2-cb77-0f999fa512fe@oracle.com>
Date: Mon, 26 Nov 2018 14:20:10 -0500
From: Steven Sistare <steven.sistare@...cle.com>
To: Prakash Sangappa <prakash.sangappa@...cle.com>,
Michal Hocko <mhocko@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
dave.hansen@...el.com, nao.horiguchi@...il.com,
akpm@...ux-foundation.org, kirill.shutemov@...ux.intel.com,
khandual@...ux.vnet.ibm.com
Subject: Re: [PATCH V2 0/6] VA to numa node information
On 11/9/2018 11:48 PM, Prakash Sangappa wrote:
> On 9/24/18 10:14 AM, Michal Hocko wrote:
>> On Fri 14-09-18 12:01:18, Steven Sistare wrote:
>>> On 9/14/2018 1:56 AM, Michal Hocko wrote:
>> [...]
>>>> Why does this matter for something that is for analysis purposes.
>>>> Reading the file for the whole address space is far from a free
>>>> operation. Is the page walk optimization really essential for usability?
>>>> Moreover what prevents move_pages implementation to be clever for the
>>>> page walk itself? In other words why would we want to add a new API
>>>> rather than make the existing one faster for everybody.
>>> One could optimize move pages. If the caller passes a consecutive range
>>> of small pages, and the page walk sees that a VA is mapped by a huge page,
>>> then it can return the same numa node for each of the following VA's that fall
>>> into the huge page range. It would be faster than 55 nsec per small page, but
>>> hard to say how much faster, and the cost is still driven by the number of
>>> small pages.
>> This is exactly what I was arguing for. There is some room for
>> improvements for the existing interface. I yet have to hear the explicit
>> usecase which would required even better performance that cannot be
>> achieved by the existing API.
>>
>
> Above mentioned optimization to move_pages() API helps when scanning
> mapped huge pages, but does not help if there are large sparse mappings
> with few pages mapped. Otherwise, consider adding page walk support in
> the move_pages() implementation, enhance the API(new flag?) to return
> address range to numa node information. The page walk optimization
> would certainly make a difference for usability.
>
> We can have applications(Like Oracle DB) having processes with large sparse
> mappings(in TBs) with only some areas of these mapped address range
> being accessed, basically large portions not having page tables backing it.
> This can become more prevalent on newer systems with multiple TBs of
> memory.
>
> Here is some data from pmap using move_pages() API with optimization.
> Following table compares time pmap takes to print address mapping of a
> large process, with numa node information using move_pages() api vs pmap
> using /proc numa_vamaps file.
>
> Running pmap command on a process with 1.3 TB of address space, with
> sparse mappings.
>
> ~1.3 TB sparse 250G dense segment with hugepages.
> move_pages 8.33s 3.14
> optimized move_pages 6.29s 0.92
> /proc numa_vamaps 0.08s 0.04
>
>
> Second column is pmap time on a 250G address range of this process, which maps
> hugepages(THP & hugetlb).
The data look compelling to me. numa_vmap provides a much smoother user experience
for the analyst who is casting a wide net looking for the root of a performance issue.
Almost no waiting to see the data.
- Steve
Powered by blists - more mailing lists