linux-kernel - Re: [PATCH 06/36] HMM: add HMM page table v2.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150701150707.GA9313@gmail.com>
Date:	Wed, 1 Jul 2015 11:07:08 -0400
From:	Jerome Glisse <j.glisse@...il.com>
To:	Mark Hairgrove <mhairgrove@...dia.com>
Cc:	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"joro@...tes.org" <joro@...tes.org>, Mel Gorman <mgorman@...e.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <jweiner@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Airlie <airlied@...hat.com>,
	Brendan Conoboy <blc@...hat.com>,
	Joe Donohue <jdonohue@...hat.com>,
	Duncan Poole <dpoole@...dia.com>,
	Sherry Cheung <SCheung@...dia.com>,
	Subhash Gutti <sgutti@...dia.com>,
	John Hubbard <jhubbard@...dia.com>,
	Lucien Dunning <ldunning@...dia.com>,
	Cameron Buschardt <cabuschardt@...dia.com>,
	Arvind Gopalakrishnan <arvindg@...dia.com>,
	Haggai Eran <haggaie@...lanox.com>,
	Shachar Raindel <raindel@...lanox.com>,
	Liran Liss <liranl@...lanox.com>,
	Roland Dreier <roland@...estorage.com>,
	Ben Sander <ben.sander@....com>,
	Greg Stoner <Greg.Stoner@....com>,
	John Bridgman <John.Bridgman@....com>,
	Michael Mantor <Michael.Mantor@....com>,
	Paul Blinzer <Paul.Blinzer@....com>,
	Laurent Morichetti <Laurent.Morichetti@....com>,
	Alexander Deucher <Alexander.Deucher@....com>,
	Oded Gabbay <Oded.Gabbay@....com>,
	Jérôme Glisse <jglisse@...hat.com>,
	Jatin Kumar <jakumar@...dia.com>
Subject: Re: [PATCH 06/36] HMM: add HMM page table v2.

On Tue, Jun 30, 2015 at 07:51:12PM -0700, Mark Hairgrove wrote:
> On Mon, 29 Jun 2015, Jerome Glisse wrote:
> > [...]
> > 
> > Iterator is what protect against concurrent freeing of the directory so it
> > has to return to caller on directory boundary (for 64bits arch with 64bits
> > pte it has return every 512 entries). Otherwise pt_iter_fini() would have
> > to walk over the whole directory range again just to drop reference and this
> > doesn't sound like a good idea.
> 
> I don't understand why it would have to return to the caller to unprotect 
> the directory. The iterator would simply drop the reference to the 
> previous directory, take a reference on the next one, and keep searching 
> for a valid entry.
> 
> Why would pt_iter_fini have to walk over the entire range? The iterator 
> would keep at most one directory per level referenced. _fini would walk 
> the per-level ptd array and unprotect each level, the same way it does 
> now.

I think here we are just misunderstanding each other. I am saying that
iterator have to return on directory boundary (ie when switching from one
directory to the next). The return part is not only for protection it is
also by design because iterator function should not test the page table
entry as different code path have different synchronization requirement.


> > So really with what you are asking it whould be:
> > 
> > hmm_pt_iter_init(&iter, start, end);
> > for(next=pt_iter_next(&iter,&ptep); next<end; next=pt_iter_next(&iter,&ptep))
> > {
> >    // Here ptep is valid until next address. Above you have to call
> >    // pt_iter_next() to switch to next directory.
> >    addr = max(start, next - (~HMM_PMD_MASK + 1));
> >    for (; addr < next; addr += PAGE_SIZE, ptep++) {
> >       // access ptep
> >    }
> > }
> > 
> > My point is that internally pt_iter_next() will do the exact same test it is
> > doing now btw cur and addr. Just that the addr is no longer explicit but iter
> > infer it.
> 
> But this way, the iteration across directories is more efficient because 
> the iterator can simply walk the directory array. Take a directory that 
> has one valid entry at the very end. The existing iteration will do this:
> 
> hmm_pt_iter_next(dir_addr[0], end)
>     Walk up the ptd array
>     Compute level start and end and compare them to dir_addr[0]
>     Compute dir_addr[1] using addr and pt->mask
>     Return dir_addr[1]
> hmm_pt_iter_update(dir_addr[1])
>     Walk up the ptd array, compute level start and end
>     Compute level index of dir_addr[1]
>     Read entry for dir_addr[1]
>     Return NULL
> hmm_pt_iter_next(dir_addr[1], end)
>     ...
> And so on 511 times until the last entry is read.
> 
> This is really more suited to a for loop iteration, which it could be if 
> this were fully contained within the _next call.

No, existing code does not necessarily do that. Current use pattern is :

for (addr = start; addr < end;) {
   ptep = hmm_pt_iter_update(iter, addr);
   if (!ptep) {
     addr = hmm_pt_iter_next(iter, addr, end);
     continue;
   }
   next = hmm_pt_level_next(pt, addr, end);
   for (; addr < next; addr += PAGE_SIZE, ptep++) {
     // Process addr using ptep.
   }
}

The inner loop is on directory boundary ie 512 entries on 64bits arch.
It is that way because on some case you do not want the iterator to
control the address, the outer loop might be accessing several different
mirror page table each might have different gap. So you really want to
have explicit address provided to the iterator function.

Also iterator can not really test for valid entry as locking requirement
and synchronization with other thread is different depending on which
code path is walking the page table. So testing inside the iterator
function is kind of pointless has the performed test might be no longer
revealent by the time it returns pointer and address to the caller.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/