lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 1 Jun 2015 15:03:32 -0400
From:	Jerome Glisse <j.glisse@...il.com>
To:	John Hubbard <jhubbard@...dia.com>
Cc:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Linus Torvalds <torvalds@...ux-foundation.org>,
	joro@...tes.org, Mel Gorman <mgorman@...e.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <jweiner@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Airlie <airlied@...hat.com>,
	Brendan Conoboy <blc@...hat.com>,
	Joe Donohue <jdonohue@...hat.com>,
	Duncan Poole <dpoole@...dia.com>,
	Sherry Cheung <SCheung@...dia.com>,
	Subhash Gutti <sgutti@...dia.com>,
	Mark Hairgrove <mhairgrove@...dia.com>,
	Lucien Dunning <ldunning@...dia.com>,
	Cameron Buschardt <cabuschardt@...dia.com>,
	Arvind Gopalakrishnan <arvindg@...dia.com>,
	Haggai Eran <haggaie@...lanox.com>,
	Shachar Raindel <raindel@...lanox.com>,
	Liran Liss <liranl@...lanox.com>,
	Roland Dreier <roland@...estorage.com>,
	Ben Sander <ben.sander@....com>,
	Greg Stoner <Greg.Stoner@....com>,
	John Bridgman <John.Bridgman@....com>,
	Michael Mantor <Michael.Mantor@....com>,
	Paul Blinzer <Paul.Blinzer@....com>,
	Laurent Morichetti <Laurent.Morichetti@....com>,
	Alexander Deucher <Alexander.Deucher@....com>,
	Oded Gabbay <Oded.Gabbay@....com>,
	Jérôme Glisse <jglisse@...hat.com>
Subject: Re: [PATCH 01/36] mmu_notifier: add event information to address
 invalidation v7

On Fri, May 29, 2015 at 08:43:59PM -0700, John Hubbard wrote:
> On Thu, 21 May 2015, j.glisse@...il.com wrote:
> 
> > From: Jérôme Glisse <jglisse@...hat.com>
> > 
> > The event information will be useful for new user of mmu_notifier API.
> > The event argument differentiate between a vma disappearing, a page
> > being write protected or simply a page being unmaped. This allow new
> > user to take different path for different event for instance on unmap
> > the resource used to track a vma are still valid and should stay around.
> > While if the event is saying that a vma is being destroy it means that any
> > resources used to track this vma can be free.
> > 
> > Changed since v1:
> >   - renamed action into event (updated commit message too).
> >   - simplified the event names and clarified their usage
> >     also documenting what exceptation the listener can have in
> >     respect to each event.
> > 
> > Changed since v2:
> >   - Avoid crazy name.
> >   - Do not move code that do not need to move.
> > 
> > Changed since v3:
> >   - Separate hugue page split from mlock/munlock and softdirty.
> 
> Do we care about fixing up patch comments? If so:
> 
> s/hugue/huge/

I am noting them down and will go over them.


[...]
> > +	MMU_HSPLIT,
> 
> Let's rename MMU_HSPLIT to one of the following, take your pick:
> 
> MMU_HUGE_PAGE_SPLIT (too long, but you can't possibly misunderstand it)
> MMU_PAGE_SPLIT (my favorite: only huge pages are ever split, so it works)
> MMU_HUGE_SPLIT (ugly, but still hard to misunderstand)

I will go with MMU_HUGE_PAGE_SPLIT 


[...]
> 
> > +	MMU_ISDIRTY,
> 
> This MMU_ISDIRTY seems like a problem to me. First of all, it looks 
> backwards: the only place that invokes it is the clear_refs_write() 
> routine, for the soft-dirty tracking feature. And in that case, the pages 
> are *not* being made dirty! Rather, the kernel is actually making the 
> pages non-writable, in order to be able to trap the subsequent page fault 
> and figure out if the page is in active use.
> 
> So, given that there is only one call site, and that call site should 
> actually be setting MMU_WRITE_PROTECT instead (I think), let's just delete 
> MMU_ISDIRTY.
> 
> Come to think about it, there is no callback possible for "a page became 
> dirty", anyway. Because the dirty and accessed bits are actually set by 
> the hardware, and software is generally unable to know the current state.
> So MMU_ISDIRTY just seems inappropriate to me, across the board.
> 
> I'll take a look at the corresponding HMM_ISDIRTY, too.

Ok i need to rename that one to CLEAR_SOFT_DIRTY, the idea is that
for HMM i would rather not write protect the memory for the device
and just rely on the regular and conservative dirtying of page. The
soft dirty is really for migrating a process where you first clear
the soft dirty bit, then copy memory while process still running,
then freeze process an only copy memory that was dirtied since
first copy. Point being that adding soft dirty to HMM is something
that can be done down the road. We should have enough bit inside
the device page table for that.


> 
> > +	MMU_MIGRATE,
> > +	MMU_MPROT,
> 
> The MMU_PROT also looks questionable. Short answer: probably better to 
> read the protection, and pass either MMU_WRITE_PROTECT, MMU_READ_WRITE 
> (that's a new item, of course), or MMU_UNMAP.
> 
> Here's why: the call site knows the protection, but by the time it filters 
> down to HMM (in later patches), that information is lost, and HMM ends up 
> doing (ouch!) another find_vma() call in order to retrieve it--and then 
> translates it into only three possible things:
> 
> // hmm_mmu_mprot_to_etype() sets one of these:
> 
>    HMM_MUNMAP
>    HMM_WRITE_PROTECT
>    HMM_NONE

Linus complained of my previous version where i differenciated the
kind of protection change that was happening, hence why i only pass
down mprot.


> 
> 
> > +	MMU_MUNLOCK,
> 
> I think MMU_UNLOCK would be clearer. We already know the scope, so the 
> extra "M" isn't adding anything.

I named it that way so it matches syscall name munlock(). I think
it is clearer to use MUNLOCK, or maybe SYSCALL_MUNLOCK

> 
> > +	MMU_MUNMAP,
> 
> Same thing here: MMU_UNMAP seems better.

Well same idea here.


> 
> > +	MMU_WRITE_BACK,
> > +	MMU_WRITE_PROTECT,
> 
> We may have to add MMU_READ_WRITE (and maybe another one, I haven't 
> bottomed out on that), if you agree with the above approach of 
> always sending a precise event, instead of "protection changed".

I think Linus point made sense last time, but i would need to read
again the thread. The idea of that patch is really to provide context
information on what kind of CPU page table changes is happening and
why.

In that respect i should probably change MMU_WRITE_PROTECT to 
MMU_KSM_WRITE_PROTECT.


Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists