[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20130712153205.GA18825@redhat.com>
Date: Fri, 12 Jul 2013 17:32:05 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: David Rientjes <rientjes@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: mempolicy: turn vma_set_policy() into
vma_dup_policy()
On 07/11, David Rientjes wrote:
>
> On Wed, 10 Jul 2013, Oleg Nesterov wrote:
>
> > +int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst)
> > +{
> > + struct mempolicy *pol = mpol_dup(vma_policy(src));
> > +
> > + if (IS_ERR(pol))
> > + return PTR_ERR(pol);
>
> PTR_ERR() returns long, so vma_dup_policy() needs to return long.
I think that "int" should be fine, or we should fix IS_ERR/ERR_PTR. If
nothing else, the changed code did the same. And there are a lot of other
"int" functions which return PTR_ERR().
But I agree, this is only correct because vma_dup_policy() checks IS_ERR()
before PTR_ERR(), and because mpol_dup() doesn't do the wrong things with
ERR_PTR().
For example, ERR_PTR(args->err) in hw_breakpoint_handler() looks really
strange and imho should be killed. But correct, it is not actually the
error.
> > @@ -2505,12 +2504,9 @@ static int __split_vma(struct mm_struct * mm, struct vm_area_struct * vma,
> > new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
> > }
> >
> > - pol = mpol_dup(vma_policy(vma));
> > - if (IS_ERR(pol)) {
> > - err = PTR_ERR(pol);
> > + err = vma_dup_policy(vma, new);
> > + if (err)
> > goto out_free_vma;
> > - }
> > - vma_set_policy(new, pol);
> >
> > if (anon_vma_clone(new, vma))
> > goto out_free_mpol;
>
> This isn't the first occurrence in mm/mmap.c, what about vma_adjust()?
> Probably need to patch 3.10 or later.
Ah, sorry for confusion, I forgot to mention that this is on top of
another -mm patch,
mm-mempolicy-fix-mbind_range-vma_adjust-interaction.patch
attached below just in case.
> Otherwise looks good.
Thanks for review ;)
Oleg.
-----------------------------------------------------------------------
[PATCH] mm: mempolicy: fix mbind_range() && vma_adjust() interaction
vma_adjust() does vma_set_policy(vma, vma_policy(next)) and this
is doubly wrong:
1. This leaks vma->vm_policy if it is not NULL and not equal to
next->vm_policy.
This can happen if vma_merge() expands "area", not prev (case 8).
2. This sets the wrong policy if vma_merge() joins prev and area,
area is the vma the caller needs to update and it still has the
old policy.
Revert 1444f92c "mm: merging memory blocks resets mempolicy" which
introduced these problems.
Change mbind_range() to recheck mpol_equal() after vma_merge() to
fix the problem 1444f92c tried to address.
Signed-off-by: Oleg Nesterov <oleg@...hat.com>
Cc: <stable@...r.kernel.org>
---
mm/mempolicy.c | 6 +++++-
mm/mmap.c | 2 +-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7431001..4baf12e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -732,7 +732,10 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
if (prev) {
vma = prev;
next = vma->vm_next;
- continue;
+ if (mpol_equal(vma_policy(vma), new_pol))
+ continue;
+ /* vma_merge() joined vma && vma->next, case 8 */
+ goto replace;
}
if (vma->vm_start != vmstart) {
err = split_vma(vma->vm_mm, vma, vmstart, 1);
@@ -744,6 +747,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
if (err)
goto out;
}
+ replace:
err = vma_replace_policy(vma, new_pol);
if (err)
goto out;
diff --git a/mm/mmap.c b/mm/mmap.c
index 7fe7f0b..42234b8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -865,7 +865,7 @@ again: remove_next = 1 + (end > next->vm_end);
if (next->anon_vma)
anon_vma_merge(vma, next);
mm->map_count--;
- vma_set_policy(vma, vma_policy(next));
+ mpol_put(vma_policy(next));
kmem_cache_free(vm_area_cachep, next);
/*
* In mprotect's case 6 (see comments on vma_merge),
--
1.5.5.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists