lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 28 Jul 2016 02:05:04 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Dave Hansen <dave.hansen@...el.com>
Cc:	Liang Li <liang.z.li@...el.com>, linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org, linux-mm@...ck.org,
	virtio-dev@...ts.oasis-open.org, kvm@...r.kernel.org,
	qemu-devel@...gnu.org, dgilbert@...hat.com, quintela@...hat.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Mel Gorman <mgorman@...hsingularity.net>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Cornelia Huck <cornelia.huck@...ibm.com>,
	Amit Shah <amit.shah@...hat.com>
Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
 page info

On Wed, Jul 27, 2016 at 03:16:57PM -0700, Dave Hansen wrote:
> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it nr_pages
> >> or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> > 
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> > 
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime between request
> > and response, so they are safe to free if they are unmodified
> > since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It
> wants to know which pages the guest thinks have good data and need to
> move.  But, the list of free pages is (likely) smaller than the list of
> pages with good data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether
> it was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.

Right - except never is a long time, so we just need it "since the request
was received".

> So we never lose data, even if this "get free page info"
> stuff is totally out of date.

So if you include pages that were written to before the request
then yes data will be lost. This is why we do this scan
after we get the request and not e.g. on boot :)

> The patch description and code comments are, um, a _bit_ light for this
> level of subtlety. :)

Add to that, for any page it is safe to skip and not add it to list.
So the requirement is for when page must *not* be on this list:
it must not be there if it is needed by guest but was not
modified since the request.

Calling it "free" will just keep confusing people.  Either use the
verbose "free or modified" or invent a new word like "discardable" and
add a comment explaining that page is always discardable unless it's
content is needed by Linux but was not modified since the request.

-- 
MST

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ