linux-kernel - Re: [PATCH 06/36] HMM: add HMM page table v2.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 29 Jun 2015 10:43:06 -0400
From:	Jerome Glisse <j.glisse@...il.com>
To:	Mark Hairgrove <mhairgrove@...dia.com>
Cc:	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"joro@...tes.org" <joro@...tes.org>, Mel Gorman <mgorman@...e.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <jweiner@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Airlie <airlied@...hat.com>,
	Brendan Conoboy <blc@...hat.com>,
	Joe Donohue <jdonohue@...hat.com>,
	Duncan Poole <dpoole@...dia.com>,
	Sherry Cheung <SCheung@...dia.com>,
	Subhash Gutti <sgutti@...dia.com>,
	John Hubbard <jhubbard@...dia.com>,
	Lucien Dunning <ldunning@...dia.com>,
	Cameron Buschardt <cabuschardt@...dia.com>,
	Arvind Gopalakrishnan <arvindg@...dia.com>,
	Haggai Eran <haggaie@...lanox.com>,
	Shachar Raindel <raindel@...lanox.com>,
	Liran Liss <liranl@...lanox.com>,
	Roland Dreier <roland@...estorage.com>,
	Ben Sander <ben.sander@....com>,
	Greg Stoner <Greg.Stoner@....com>,
	John Bridgman <John.Bridgman@....com>,
	Michael Mantor <Michael.Mantor@....com>,
	Paul Blinzer <Paul.Blinzer@....com>,
	Laurent Morichetti <Laurent.Morichetti@....com>,
	Alexander Deucher <Alexander.Deucher@....com>,
	Oded Gabbay <Oded.Gabbay@....com>,
	Jérôme Glisse <jglisse@...hat.com>,
	Jatin Kumar <jakumar@...dia.com>
Subject: Re: [PATCH 06/36] HMM: add HMM page table v2.

On Fri, Jun 26, 2015 at 06:34:16PM -0700, Mark Hairgrove wrote:
> 
> 
> On Fri, 26 Jun 2015, Jerome Glisse wrote:
> 
> > On Thu, Jun 25, 2015 at 03:57:29PM -0700, Mark Hairgrove wrote:
> > > On Thu, 21 May 2015, j.glisse@...il.com wrote:
> > > > From: Jérôme Glisse <jglisse@...hat.com>
> > > > [...]
> > > > +
> > > > +void hmm_pt_iter_init(struct hmm_pt_iter *iter);
> > > > +void hmm_pt_iter_fini(struct hmm_pt_iter *iter, struct hmm_pt *pt);
> > > > +unsigned long hmm_pt_iter_next(struct hmm_pt_iter *iter,
> > > > +			       struct hmm_pt *pt,
> > > > +			       unsigned long addr,
> > > > +			       unsigned long end);
> > > > +dma_addr_t *hmm_pt_iter_update(struct hmm_pt_iter *iter,
> > > > +			       struct hmm_pt *pt,
> > > > +			       unsigned long addr);
> > > > +dma_addr_t *hmm_pt_iter_fault(struct hmm_pt_iter *iter,
> > > > +			      struct hmm_pt *pt,
> > > > +			      unsigned long addr);
> > > 
> > > I've got a few more thoughts on hmm_pt_iter after looking at some of the 
> > > later patches. I think I've convinced myself that this patch functionally 
> > > works as-is, but I've got some suggestions and questions about the design.
> > > 
> > > Right now there are these three major functions:
> > > 
> > > 1) hmm_pt_iter_update(addr)
> > >    - Returns the hmm_pte * for addr, or NULL if none exists.
> > > 
> > > 2) hmm_pt_iter_fault(addr)
> > >    - Returns the hmm_pte * for addr, allocating a new one if none exists.
> > > 
> > > 3) hmm_pt_iter_next(addr, end)
> > >    - Returns the next possibly-valid address. The caller must use
> > >      hmm_pt_iter_update to check if there really is an hmm_pte there.
> > > 
> > > In my view, there are two sources of confusion here:
> > > - Naming. "update" shares a name with the HMM mirror callback, and it also
> > >   implies that the page tables are "updated" as a result of the call. 
> > >   "fault" likewise implies that the function handles a fault in some way.
> > >   Neither of these implications are true.
> > 
> > Maybe hmm_pt_iter_walk & hmm_pt_iter_populate are better name ?
> 
> hmm_pt_iter_populate sounds good. See below for _walk.
> 
> 
> > 
> > 
> > > - hmm_pt_iter_next and hmm_pt_iter_update have some overlapping
> > >   functionality when compared to traditional iterators, requiring the 
> > >   callers to all do this sort of thing:
> > > 
> > >         hmm_pte = hmm_pt_iter_update(&iter, &mirror->pt, addr);
> > >         if (!hmm_pte) {
> > >             addr = hmm_pt_iter_next(&iter, &mirror->pt,
> > >                         addr, event->end);
> > >             continue;
> > >         }
> > > 
> > > Wouldn't it be more efficient and simpler to have _next do all the 
> > > iteration internally so it always returns the next valid entry? Then you 
> > > could combine _update and _next into a single function, something along 
> > > these lines (which also addresses the naming concern):
> > > 
> > > void hmm_pt_iter_init(iter, pt, start, end);
> > > unsigned long hmm_pt_iter_next(iter, hmm_pte *);
> > > unsigned long hmm_pt_iter_next_alloc(iter, hmm_pte *);
> > > 
> > > hmm_pt_iter_next would return the address and ptep of the next valid 
> > > entry, taking the place of the existing _update and _next functions. 
> > > hmm_pt_iter_next_alloc takes the place of _fault.
> > > 
> > > Also, since the _next functions don't take in an address, the iterator 
> > > doesn't have to handle the input addr being different from iter->cur.
> > 
> > It would still need to do the same kind of test, this test is really to
> > know when you switch from one directory to the next and to drop and take
> > reference accordingly.
> 
> But all of the directory references are already hidden entirely in the 
> iterator _update function. The caller only has to worry about taking 
> references on the bottom level, so I don't understand why the iterator 
> needs to return to the caller when it hits the end of a directory. Or for 
> that matter, why it returns every possible index within a directory to the 
> caller whether that index is valid or not.

Iterator is what protect against concurrent freeing of the directory so it
has to return to caller on directory boundary (for 64bits arch with 64bits
pte it has return every 512 entries). Otherwise pt_iter_fini() would have
to walk over the whole directory range again just to drop reference and this
doesn't sound like a good idea.

So really with what you are asking it whould be:

hmm_pt_iter_init(&iter, start, end);
for(next=pt_iter_next(&iter,&ptep); next<end; next=pt_iter_next(&iter,&ptep))
{
   // Here ptep is valid until next address. Above you have to call
   // pt_iter_next() to switch to next directory.
   addr = max(start, next - (~HMM_PMD_MASK + 1));
   for (; addr < next; addr += PAGE_SIZE, ptep++) {
      // access ptep
   }
}

My point is that internally pt_iter_next() will do the exact same test it is
doing now btw cur and addr. Just that the addr is no longer explicit but iter
infer it.

> If _next only returned to the caller when it hit a valid hmm_pte (or end), 
> then only one function would be needed (_next) instead of two 
> (_update/_walk and _next).

On the valid entry side, this is because when you are walking the page table
you have no garanty that the entry will not be clear below you (in case of
concurrent invalidation). The only garanty you have is that if you are able
to read a valid entry from the update() callback then this entry is valid
until you get a new update() callback telling you otherwise.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/