[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <91355107-ed73-fce5-7051-3a746b526163@redhat.com>
Date:   Thu, 22 Aug 2019 12:18:56 -0400
From:   Nitesh Narayan Lal <nitesh@...hat.com>
To:     Alexander Duyck <alexander.duyck@...il.com>, kvm@...r.kernel.org,
        mst@...hat.com, david@...hat.com, dave.hansen@...el.com,
        linux-kernel@...r.kernel.org, willy@...radead.org,
        mhocko@...nel.org, linux-mm@...ck.org, akpm@...ux-foundation.org,
        virtio-dev@...ts.oasis-open.org, osalvador@...e.de
Cc:     yang.zhang.wz@...il.com, pagupta@...hat.com, riel@...riel.com,
        konrad.wilk@...cle.com, lcapitulino@...hat.com,
        wei.w.wang@...el.com, aarcange@...hat.com, pbonzini@...hat.com,
        dan.j.williams@...el.com, alexander.h.duyck@...ux.intel.com
Subject: Re: [virtio-dev] [PATCH v6 4/6] mm: Introduce Reported pages
On 8/21/19 10:59 AM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
>
> In order to pave the way for free page reporting in virtualized
> environments we will need a way to get pages out of the free lists and
> identify those pages after they have been returned. To accomplish this,
> this patch adds the concept of a Reported Buddy, which is essentially
> meant to just be the Uptodate flag used in conjunction with the Buddy
> page type.
>
> It adds a set of pointers we shall call "boundary" which represents the
> upper boundary between the unreported and reported pages. The general idea
> is that in order for a page to cross from one side of the boundary to the
> other it will need to go through the reporting process. Ultimately a
> free_list has been fully processed when the boundary has been moved from
> the tail all they way up to occupying the first entry in the list.
>
> Doing this we should be able to make certain that we keep the reported
> pages as one contiguous block in each free list. This will allow us to
> efficiently manipulate the free lists whenever we need to go in and start
> sending reports to the hypervisor that there are new pages that have been
> freed and are no longer in use.
>
> An added advantage to this approach is that we should be reducing the
> overall memory footprint of the guest as it will be more likely to recycle
> warm pages versus trying to allocate the reported pages that were likely
> evicted from the guest memory.
>
> Since we will only be reporting one zone at a time we keep the boundary
> limited to being defined for just the zone we are currently reporting pages
> from. Doing this we can keep the number of additional pointers needed quite
> small. To flag that the boundaries are in place we use a single bit
> in the zone to indicate that reporting and the boundaries are active.
>
> The determination of when to start reporting is based on the tracking of
> the number of free pages in a given area versus the number of reported
> pages in that area. We keep track of the number of reported pages per
> free_area in a separate zone specific area. We do this to avoid modifying
> the free_area structure as this can lead to false sharing for the highest
> order with the zone lock which leads to a noticeable performance
> degradation.
[...]
> +
> +/* request page reporting on this zone */
> +void __page_reporting_request(struct zone *zone)
> +{
> +	struct page_reporting_dev_info *phdev;
> +
> +	rcu_read_lock();
> +
> +	/*
> +	 * We use RCU to protect the ph_dev_info pointer. In almost all
> +	 * cases this should be present, however in the unlikely case of
> +	 * a shutdown this will be NULL and we should exit.
> +	 */
> +	phdev = rcu_dereference(ph_dev_info);
> +	if (unlikely(!phdev))
> +		return;
> +
Just a minor comment here.
Although this is unlikely to trigger still I think you should release the
rcu_read_lock before returning.
> +	/*
> +	 * We can use separate test and set operations here as there
> +	 * is nothing else that can set or clear this bit while we are
> +	 * holding the zone lock. The advantage to doing it this way is
> +	 * that we don't have to dirty the cacheline unless we are
> +	 * changing the value.
> +	 */
> +	__set_bit(ZONE_PAGE_REPORTING_REQUESTED, &zone->flags);
> +
> +	/*
> +	 * Delay the start of work to allow a sizable queue to
> +	 * build. For now we are limiting this to running no more
> +	 * than 10 times per second.
> +	 */
> +	if (!atomic_fetch_inc(&phdev->refcnt))
> +		schedule_delayed_work(&phdev->work, HZ / 10);
> +
> +	rcu_read_unlock();
> +}
> +
[...]
> +	}
> +
> +	/* enable page reporting notification */
> +	static_branch_enable(&page_reporting_notify_enabled);
> +err_out:
> +	mutex_unlock(&page_reporting_mutex);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(page_reporting_startup);
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@...ts.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@...ts.oasis-open.org
>
-- 
Thanks
Nitesh
Powered by blists - more mailing lists
 
