lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 13 Nov 2014 17:18:03 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"Kirill A. Shutemov" <kirill@...temov.name>
Cc:	Jerome Glisse <j.glisse@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>, Joerg Roedel <joro@...tes.org>,
	Mel Gorman <mgorman@...e.de>, "H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <jweiner@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Airlie <airlied@...hat.com>,
	Brendan Conoboy <blc@...hat.com>,
	Joe Donohue <jdonohue@...hat.com>,
	Duncan Poole <dpoole@...dia.com>,
	Sherry Cheung <SCheung@...dia.com>,
	Subhash Gutti <sgutti@...dia.com>,
	John Hubbard <jhubbard@...dia.com>,
	Mark Hairgrove <mhairgrove@...dia.com>,
	Lucien Dunning <ldunning@...dia.com>,
	Cameron Buschardt <cabuschardt@...dia.com>,
	Arvind Gopalakrishnan <arvindg@...dia.com>,
	Shachar Raindel <raindel@...lanox.com>,
	Liran Liss <liranl@...lanox.com>,
	Roland Dreier <roland@...estorage.com>,
	Ben Sander <ben.sander@....com>,
	Greg Stoner <Greg.Stoner@....com>,
	John Bridgman <John.Bridgman@....com>,
	Michael Mantor <Michael.Mantor@....com>,
	Paul Blinzer <Paul.Blinzer@....com>,
	Laurent Morichetti <Laurent.Morichetti@....com>,
	Alexander Deucher <Alexander.Deucher@....com>,
	Oded Gabbay <Oded.Gabbay@....com>,
	Jérôme Glisse <jglisse@...hat.com>
Subject: Re: [PATCH 3/5] lib: lockless generic and arch independent page table
 (gpt) v2.

On Thu, Nov 13, 2014 at 4:58 PM, Kirill A. Shutemov
<kirill@...temov.name> wrote:
> On Thu, Nov 13, 2014 at 03:50:02PM -0800, Linus Torvalds wrote:
>> +/*
>> + * The 'tree_level' data only describes one particular level
>> + * of the tree. The upper levels are totally invisible to the
>> + * user of the tree walker, since the tree walker will walk
>> + * those using the tree definitions.
>> + *
>> + * NOTE! "struct tree_entry" is an opaque type, and is just a
>> + * used as a pointer to the particular level. You can figure
>> + * out which level you are at by looking at the "tree_level",
>> + * but even better is to just use different "lookup()"
>> + * functions for different levels, at which point the
>> + * function is inherent to the level.
>
> Please, don't.
>
> We will end up with the same last-level centric code as we have now in mm
> subsystem: all code only cares about pte.

You realize that we have a name for this. It's called "reality".

> It makes implementing variable
> page size support really hard and lead to copy-paste approach. And to
> hugetlb parallel world...

No, go back and read the thing.

You're confusing two different issues: looking up the tree, and
actually walking the end result.

The "looking up different levels of the tree" absolutely _should_ use
different actors for different levels. Because the levels are not at
all guaranteed to be the same.

Sure, they often are. When you extend a tree, it's fairly reasonable
to try to make the different levels look identical. But "often" is not
at all "always".

More importantly, nobody should ever care. Because the whole *point*
of the tree walker is that the user never sees any of this. This is
purely an implementation detail of the tree itself. Somebody who just
*walks* the tree only sees the final end result.

And *that* is the "walk()" callback. Which gets the virtual address
and the length, exactly so that for a super-page you don't even really
see the difference between walking different levels (well, you do see
it, since the length will differ).

Now, I didn't actually try to make that whole thing very transparent.
In particular, somebody who just wants to see the data (and ignore as
much of the "tree" details as possible) would really want to have not
that "tree_entry", but the whole "struct tree_level *" and in
particular a way to *map* the page. I left that out entirely, because
it wasn't really central to the whole tree walking.

But thinking that the levels should look the same is fundamentally
bogus. For one, because they don't always look the same at all. For
another, because it's completely separate from the accessing of the
level data anyway.

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ