[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.03.1506020214160.17700@nvidia.com>
Date: Tue, 2 Jun 2015 02:32:01 -0700
From: John Hubbard <jhubbard@...dia.com>
To: <j.glisse@...il.com>
CC: <akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>,
<linux-mm@...ck.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
<joro@...tes.org>, Mel Gorman <mgorman@...e.de>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Johannes Weiner <jweiner@...hat.com>,
Larry Woodman <lwoodman@...hat.com>,
Rik van Riel <riel@...hat.com>,
Dave Airlie <airlied@...hat.com>,
Brendan Conoboy <blc@...hat.com>,
Joe Donohue <jdonohue@...hat.com>,
Duncan Poole <dpoole@...dia.com>,
Sherry Cheung <SCheung@...dia.com>,
Subhash Gutti <sgutti@...dia.com>,
Mark Hairgrove <mhairgrove@...dia.com>,
Lucien Dunning <ldunning@...dia.com>,
Cameron Buschardt <cabuschardt@...dia.com>,
Arvind Gopalakrishnan <arvindg@...dia.com>,
Haggai Eran <haggaie@...lanox.com>,
Shachar Raindel <raindel@...lanox.com>,
Liran Liss <liranl@...lanox.com>,
Roland Dreier <roland@...estorage.com>,
Ben Sander <ben.sander@....com>,
Greg Stoner <Greg.Stoner@....com>,
John Bridgman <John.Bridgman@....com>,
Michael Mantor <Michael.Mantor@....com>,
Paul Blinzer <Paul.Blinzer@....com>,
Laurent Morichetti <Laurent.Morichetti@....com>,
Alexander Deucher <Alexander.Deucher@....com>,
Oded Gabbay <Oded.Gabbay@....com>,
Jérôme Glisse <jglisse@...hat.com>
Subject: Re: [PATCH 02/36] mmu_notifier: keep track of active invalidation
ranges v3
On Thu, 21 May 2015, j.glisse@...il.com wrote:
> From: Jérôme Glisse <jglisse@...hat.com>
>
> The mmu_notifier_invalidate_range_start() and mmu_notifier_invalidate_range_end()
> can be considered as forming an "atomic" section for the cpu page table update
> point of view. Between this two function the cpu page table content is unreliable
> for the address range being invalidated.
>
> Current user such as kvm need to know when they can trust the content of the cpu
> page table. This becomes even more important to new users of the mmu_notifier
> api (such as HMM or ODP).
>
> This patch use a structure define at all call site to invalidate_range_start()
> that is added to a list for the duration of the invalidation. It adds two new
> helpers to allow querying if a range is being invalidated or to wait for a range
> to become valid.
>
> For proper synchronization, user must block new range invalidation from inside
> there invalidate_range_start() callback, before calling the helper functions.
> Otherwise there is no garanty that a new range invalidation will not be added
> after the call to the helper function to query for existing range.
Hi Jerome,
Most of this information will make nice block comments for the new helper
routines. I can help tighten up the writing slightly, but first:
Question: in hmm.c's hmm_notifier_invalidate function (looking at the
entire patchset, for a moment), I don't see any blocking of new range
invalidations, even though you point out, above, that this is required. Am
I missing it, and if so, where should I be looking instead?
>
> Changed since v1:
> - Fix a possible deadlock in mmu_notifier_range_wait_valid()
>
> Changed since v2:
> - Add the range to invalid range list before calling ->range_start().
> - Del the range from invalid range list after calling ->range_end().
> - Remove useless list initialization.
>
> Signed-off-by: Jérôme Glisse <jglisse@...hat.com>
> Reviewed-by: Rik van Riel <riel@...hat.com>
> Reviewed-by: Haggai Eran <haggaie@...lanox.com>
> ---
> drivers/gpu/drm/i915/i915_gem_userptr.c | 9 ++--
> drivers/gpu/drm/radeon/radeon_mn.c | 14 +++---
> drivers/infiniband/core/umem_odp.c | 16 +++----
> drivers/misc/sgi-gru/grutlbpurge.c | 15 +++----
> drivers/xen/gntdev.c | 15 ++++---
> fs/proc/task_mmu.c | 11 +++--
> include/linux/mmu_notifier.h | 55 ++++++++++++-----------
> kernel/events/uprobes.c | 13 +++---
> mm/huge_memory.c | 78 ++++++++++++++------------------
> mm/hugetlb.c | 55 ++++++++++++-----------
> mm/ksm.c | 28 +++++-------
> mm/madvise.c | 20 ++++-----
> mm/memory.c | 72 +++++++++++++++++-------------
> mm/migrate.c | 36 +++++++--------
> mm/mmu_notifier.c | 79 ++++++++++++++++++++++++++++-----
> mm/mprotect.c | 18 ++++----
> mm/mremap.c | 14 +++---
> virt/kvm/kvm_main.c | 10 ++---
> 18 files changed, 302 insertions(+), 256 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 452e9b1..80fe72a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -131,16 +131,15 @@ restart:
>
> static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> struct mm_struct *mm,
> - unsigned long start,
> - unsigned long end,
> - enum mmu_event event)
> + const struct mmu_notifier_range *range)
> {
> struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn);
> struct interval_tree_node *it = NULL;
> - unsigned long next = start;
> + unsigned long next = range->start;
> unsigned long serial = 0;
> + /* interval ranges are inclusive, but invalidate range is exclusive */
> + unsigned long end = range->end - 1, start = range->start;
A *very* minor point, but doing it that way messes up the scope of the
comment. Something more like this might be cleaner:
unsigned long start = range->start;
unsigned long next = start;
unsigned long serial = 0;
/* interval ranges are inclusive, but invalidate range is exclusive */
unsigned long end = range->end - 1;
[...]
> - enum mmu_event event)
> + struct mmu_notifier_range *range)
>
> {
> struct mmu_notifier *mn;
> int id;
>
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + list_add_tail(&range->list, &mm->mmu_notifier_mm->ranges);
> + mm->mmu_notifier_mm->nranges++;
Is this missing a call to wake_up(&mm->mmu_notifier_mm->wait_queue)? If
not, then it would be helpful to explain why that's only required for
nranges--, and not for the nranges++ case. The helper routine is merely
waiting for nranges to *change*, not looking for greater than or less
than.
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> +
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> if (mn->ops->invalidate_range_start)
> - mn->ops->invalidate_range_start(mn, mm, start,
> - end, event);
> + mn->ops->invalidate_range_start(mn, mm, range);
> }
> srcu_read_unlock(&srcu, id);
> }
> EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
>
> void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
> - unsigned long start,
> - unsigned long end,
> - enum mmu_event event)
> + struct mmu_notifier_range *range)
> {
> struct mmu_notifier *mn;
> int id;
> @@ -211,12 +211,23 @@ void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
> * (besides the pointer check).
> */
> if (mn->ops->invalidate_range)
> - mn->ops->invalidate_range(mn, mm, start, end);
> + mn->ops->invalidate_range(mn, mm,
> + range->start, range->end);
> if (mn->ops->invalidate_range_end)
> - mn->ops->invalidate_range_end(mn, mm, start,
> - end, event);
> + mn->ops->invalidate_range_end(mn, mm, range);
> }
> srcu_read_unlock(&srcu, id);
> +
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + list_del_init(&range->list);
> + mm->mmu_notifier_mm->nranges--;
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> +
> + /*
> + * Wakeup after callback so they can do their job before any of the
> + * waiters resume.
> + */
> + wake_up(&mm->mmu_notifier_mm->wait_queue);
> }
> EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);
>
> @@ -235,6 +246,49 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> }
> EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range);
>
We definitely want to put a little documentation here.
> +static bool mmu_notifier_range_is_valid_locked(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
This routine is named "_range_is_valid_", but it takes in an implicit
range (start, end), and also a list of ranges (buried in mm), and so it's
a little confusing. I'd like to consider *maybe* changing either the name,
or the args (range* instead of start, end?), or something.
Could you please say a few words about the intent of this routine, to get
us started there?
> +{
> + struct mmu_notifier_range *range;
> +
> + list_for_each_entry(range, &mm->mmu_notifier_mm->ranges, list) {
> + if (!(range->end <= start || range->start >= end))
> + return false;
This has a lot of negatives in it, if you count the innermost "not in
range" expression. It can be simplified to this:
if(range->end > start && range->start < end)
return false;
> + }
> + return true;
> +}
> +
> +bool mmu_notifier_range_is_valid(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + bool valid;
> +
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + valid = mmu_notifier_range_is_valid_locked(mm, start, end);
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> + return valid;
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_is_valid);
> +
> +void mmu_notifier_range_wait_valid(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + while (!mmu_notifier_range_is_valid_locked(mm, start, end)) {
> + int nranges = mm->mmu_notifier_mm->nranges;
> +
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> + wait_event(mm->mmu_notifier_mm->wait_queue,
> + nranges != mm->mmu_notifier_mm->nranges);
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + }
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_wait_valid);
> +
> static int do_mmu_notifier_register(struct mmu_notifier *mn,
> struct mm_struct *mm,
> int take_mmap_sem)
> @@ -264,6 +318,9 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
> if (!mm_has_notifiers(mm)) {
[...]
That's all I could see to mention for this one, thanks,
john h
Powered by blists - more mailing lists