[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOSf1CHKY7LT0z+wpo7jUy3aYUDHCKDKwF0XoMwpKN4JwfYjeA@mail.gmail.com>
Date: Mon, 12 Sep 2016 16:29:55 +1000
From: "Oliver O'Halloran" <oohall@...il.com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: linux-mm@...ck.org, Andrea Arcangeli <aarcange@...hat.com>,
Xiao Guangrong <guangrong.xiao@...ux.intel.com>,
Arnd Bergmann <arnd@...db.de>, linux-nvdimm@...ts.01.org,
linux-api@...r.kernel.org,
Dave Hansen <dave.hansen@...ux.intel.com>,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [RFC PATCH 1/2] mm, mincore2(): retrieve dax and tlb-size
attributes of an address range
On Mon, Sep 12, 2016 at 3:31 AM, Dan Williams <dan.j.williams@...el.com> wrote:
> As evidenced by this bug report [1], userspace libraries are interested
> in whether a mapping is DAX mapped, i.e. no intervening page cache.
> Rather than using the ambiguous VM_MIXEDMAP flag in smaps, provide an
> explicit "is dax" indication as a new flag in the page vector populated
> by mincore.
>
> There are also cases, particularly for testing and validating a
> configuration to know the hardware mapping geometry of the pages in a
> given process address range. Consider filesystem-dax where a
> configuration needs to take care to align partitions and block
> allocations before huge page mappings might be used, or
> anonymous-transparent-huge-pages where a process is opportunistically
> assigned large pages. mincore2() allows these configurations to be
> surveyed and validated.
>
> The implementation takes advantage of the unused bits in the per-page
> byte returned for each PAGE_SIZE extent of a given address range. The
> new format of each vector byte is:
>
> (TLB_SHIFT - PAGE_SHIFT) << 2 | vma_is_dax() << 1 | page_present
What is userspace expected to do with the information in vec? Whether
PMD or THP mappings can be used is going to depend more on the block
allocations done by the filesystem rather than anything the an
application can directly influence. Returning a vector for each page
makes some sense in the mincore() case since the application can touch
each page to fault them in, but I don't see what they can do here.
Why not just get rid of vec entirely and make mincore2() a yes/no
check over the range for whatever is supplied in flags? That would
work for NVML's use case and it should be easier to extend if needed.
Oliver
Powered by blists - more mailing lists