linux-kernel - Re: [PATCH 01/36] mmu_notifier: add event information to address invalidation v7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150603160711.GA2602@gmail.com>
Date:	Wed, 3 Jun 2015 12:07:12 -0400
From:	Jerome Glisse <j.glisse@...il.com>
To:	John Hubbard <jhubbard@...dia.com>
Cc:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Linus Torvalds <torvalds@...ux-foundation.org>,
	joro@...tes.org, Mel Gorman <mgorman@...e.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <jweiner@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Airlie <airlied@...hat.com>,
	Brendan Conoboy <blc@...hat.com>,
	Joe Donohue <jdonohue@...hat.com>,
	Duncan Poole <dpoole@...dia.com>,
	Sherry Cheung <SCheung@...dia.com>,
	Subhash Gutti <sgutti@...dia.com>,
	Mark Hairgrove <mhairgrove@...dia.com>,
	Lucien Dunning <ldunning@...dia.com>,
	Cameron Buschardt <cabuschardt@...dia.com>,
	Arvind Gopalakrishnan <arvindg@...dia.com>,
	Haggai Eran <haggaie@...lanox.com>,
	Shachar Raindel <raindel@...lanox.com>,
	Liran Liss <liranl@...lanox.com>,
	Roland Dreier <roland@...estorage.com>,
	Ben Sander <ben.sander@....com>,
	Greg Stoner <Greg.Stoner@....com>,
	John Bridgman <John.Bridgman@....com>,
	Michael Mantor <Michael.Mantor@....com>,
	Paul Blinzer <Paul.Blinzer@....com>,
	Laurent Morichetti <Laurent.Morichetti@....com>,
	Alexander Deucher <Alexander.Deucher@....com>,
	Oded Gabbay <Oded.Gabbay@....com>,
	Jérôme Glisse <jglisse@...hat.com>
Subject: Re: [PATCH 01/36] mmu_notifier: add event information to address
 invalidation v7

On Mon, Jun 01, 2015 at 04:10:46PM -0700, John Hubbard wrote:
> On Mon, 1 Jun 2015, Jerome Glisse wrote:
> > On Fri, May 29, 2015 at 08:43:59PM -0700, John Hubbard wrote:
> > > On Thu, 21 May 2015, j.glisse@...il.com wrote:
> > > > From: Jérôme Glisse <jglisse@...hat.com>

[...]
> > > > +	MMU_ISDIRTY,
> > > 
> > > This MMU_ISDIRTY seems like a problem to me. First of all, it looks 
> > > backwards: the only place that invokes it is the clear_refs_write() 
> > > routine, for the soft-dirty tracking feature. And in that case, the pages 
> > > are *not* being made dirty! Rather, the kernel is actually making the 
> > > pages non-writable, in order to be able to trap the subsequent page fault 
> > > and figure out if the page is in active use.
> > > 
> > > So, given that there is only one call site, and that call site should 
> > > actually be setting MMU_WRITE_PROTECT instead (I think), let's just delete 
> > > MMU_ISDIRTY.
> > > 
> > > Come to think about it, there is no callback possible for "a page became 
> > > dirty", anyway. Because the dirty and accessed bits are actually set by 
> > > the hardware, and software is generally unable to know the current state.
> > > So MMU_ISDIRTY just seems inappropriate to me, across the board.
> > > 
> > > I'll take a look at the corresponding HMM_ISDIRTY, too.
> > 
> > Ok i need to rename that one to CLEAR_SOFT_DIRTY, the idea is that
> > for HMM i would rather not write protect the memory for the device
> > and just rely on the regular and conservative dirtying of page. The
> > soft dirty is really for migrating a process where you first clear
> > the soft dirty bit, then copy memory while process still running,
> > then freeze process an only copy memory that was dirtied since
> > first copy. Point being that adding soft dirty to HMM is something
> > that can be done down the road. We should have enough bit inside
> > the device page table for that.
> > 
> 
> Yes, I think renaming it to CLEAR_SOFT_DIRTY will definitely allow more 
> accurate behavior in response to these events.
> 
> Looking ahead, a couple things:
> 
> 1. This mechanism is also used for general memory utilization tracking (I 
> see that Vladimir DavyDov has an "idle memory tracking" proposal that 
> assumes this works, for example: https://lwn.net/Articles/642202/ and 
> https://lkml.org/lkml/2015/5/12/449).
> 
> 2. It seems hard to avoid the need to eventually just write protect the 
> page, whether it is on the CPU or the remote device, if things like device 
> drivers or user space need to track write accesses to a virtual address. 
> Either you write protect the page, and trap the page faults, or you wait 
> until later and read the dirty bit (indirectly, via something like 
> unmap_mapping_range). Or did you have something else in mind?
> 
> Anyway, none of that needs to hold up this part of the patchset, because 
> the renaming fixes things up for the future code to do the right thing.

I will go over Vladimir patchset it was on my radar but haven't had yet a
chance to go over it. We will likely need to do the write protect for device
too. But as you said this is not an issue that this patch need a fix for,
only HMM would need to change. I will do that.


[...]
> > > We may have to add MMU_READ_WRITE (and maybe another one, I haven't 
> > > bottomed out on that), if you agree with the above approach of 
> > > always sending a precise event, instead of "protection changed".
> > 
> > I think Linus point made sense last time, but i would need to read
> > again the thread. The idea of that patch is really to provide context
> > information on what kind of CPU page table changes is happening and
> > why.
> >
> 
> Shoot, I tried to find that conversation, but my search foo is too weak. 
> If you have a link to that thread, I'd appreciate it, so I can refresh my 
> memory.
> 
> I was hoping to re-read it and see if anything has changed. It's not 
> really a huge problem to call find_vma() again, but I do want to be sure 
> that there's a good reason for doing so.
>  
> Otherwise, I'll just rely on your memory that Linus preferred your current 
> approach, and call it good, then.

http://lkml.iu.edu/hypermail/linux/kernel/1406.3/04880.html

I am working on doing some of the changes discussed so far, i will push my
tree to git://people.freedesktop.org/~glisse/linux hmm branch once i am done.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/