lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140708204132.GA16195@nhori.redhat.com>
Date:	Tue, 8 Jul 2014 16:41:32 -0400
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Dave Hansen <dave.hansen@...el.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Konstantin Khlebnikov <koct9i@...il.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Johannes Weiner <hannes@...xchg.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	David Miller <davem@...emloft.net>,
	Andres Freund <andres@...quadrant.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Christoph Hellwig <hch@...radead.org>,
	Dave Chinner <david@...morbit.com>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Linux API <linux-api@...r.kernel.org>,
	Naoya Horiguchi <nao.horiguchi@...il.com>,
	Kees Cook <kees@...flux.net>
Subject: Re: [PATCH v3 1/3] mm: introduce fincore()

On Tue, Jul 08, 2014 at 12:42:58PM -0700, Dave Hansen wrote:
> On 07/08/2014 12:03 PM, Naoya Horiguchi wrote:
> >> > The biggest question for me, though, is whether we want to start
> >> > designing these per-page interfaces to consider different page sizes, or
> >> > whether we're going to just continue to pretend that the entire world is
> >> > 4k pages.  Using FINCORE_BMAP on 1GB hugetlbfs files would be a bit
> >> > silly, for instance.
> > I didn't answer this question, sorry.
> > 
> > In my option, hugetlbfs pages should be handled as one hugepage (not as
> > many 4kB pages) to avoid lots of meaningless data transfer, as you pointed
> > out. And the current patch already works like that.
> 
> Just reading the code, I don't see any way that pc_shift gets passed
> down in to the do_fincore() loop.

No need to pass it down because operations over page cache tree use
page index internally to identify the in-file position and doesn't care
about page size. In 2MB hugetlbfs file, for example, index 1 means
byte offset 2MB (not offset 4kB.) So radix_tree_for_each_slot() runs
iter.index like 0 -> 1 -> 2 ... (instead of 0 -> 512 -> 1024 ...)

>  I don't see it getting reflected in
> to 'nr' or 'nr_pages' in there, and I can't see how:
> 
> 	jump = iter.index - fc->pgstart - nr;
> 
> can possibly be right since iter.index is being kept against the offset
> in the userspace buffer (4k pages) and 'nr' and fc->pgstart are
> essentially done in the huge page size.

... so all of iter.index, fc->pgstart, and nr is the same unit,
index (in hugepage size.) 
This is a pure index calculation, and do_fincore() is exactly the same
between 4kB pages and hugetlbfs pages.

> If you had a 2-page 1GB-hpage_size() hugetlbfs file, you would only have
> two pages in the radix tree, and only two iterations of
> radix_tree_for_each_slot().

Correct.

>  It would only set the first two bytes of a
> 256k BMAP buffer since only two pages were encountered in the radix tree.

Hmm, this example shows me a problem, thanks.

If the user knows the fd is for 1GB hugetlbfs file, it just prepares
the 2 bytes buffer, so no problem.
But if the user doesn't know whether the fd is from hugetlbfs file,
the user must prepare the large buffer, though only first few bytes
are used. And the more problematic is that the user could interpret
the data in buffer differently:
  1. only the first two 4kB-pages are loaded in the 2GB range,
  2. two 1GB-pages are loaded.
So for such callers, fincore() must notify the relevant page size
in some way on return.
Returning it via fincore_extra is my first thought but I'm not sure
if it's elegant enough.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ