[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090615035749.6f8236cc@woof.tlv.redhat.com>
Date: Mon, 15 Jun 2009 03:57:49 +0300
From: Izik Eidus <ieidus@...hat.com>
To: Izik Eidus <ieidus@...hat.com>
Cc: Hugh Dickins <hugh.dickins@...cali.co.uk>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] ksm: write protect pages from inside ksm
On Mon, 15 Jun 2009 03:05:14 +0300
Izik Eidus <ieidus@...hat.com> wrote:
> Izik Eidus wrote:
> > Izik Eidus wrote:
> >> Hugh Dickins wrote:
> >>> On Sat, 13 Jun 2009, Izik Eidus wrote:
> >>>
> >>>> Hugh, so untill here we are sync,
> >>>>
> >>>
> >>> Yes, that fits with what I have here, thanks (or where it didn't
> >>> quite fit, e.g. ' versus `, I've adjusted to what you have!). And
> >>> thanks for fixing my *orig_pte = *ptep bug, you did point that out
> >>> before, but I misunderstood at first.
> >>>
> >>>
> >>>> Question is what you want me to do now?,
> >>>> (Beacuse we are skipping 2.6.31, It is ok to you to tell me
> >>>> something like: "Shut up and let me see what i can get with this
> >>>> madvise" - that from one side.
> >>>> From another side if you want me to do anything please say.
> >>>>
> >>>
> >>> I had to get a bit further at my end before answering on that,
> >>> but now the answer is clear: please do some testing of your RFC
> >>> madvise() version (which is what I'm just tidying up a little),
> >>> and let me know any bugfixes you find. Try with SLAB or SLUB or
> >>> SLQB debug on e.g. CONFIG_SLUB=y, CONFIG_SLUB_DEBUG=y and boot
> >>> option "slub_debug".
> >>>
> >>
> >> Sure, let me check it.
> >> (You do have Andrea patch that fix the "used after free slab
> >> entries" ?)
> >
> > How fast is it crush opps to you?, I compiled it and ran it here on
> > 2.6.30-rc4-mm1 with:
> > "Enable SLQB debugging support" and "SLQB debugging on by default,
> > and it run and merge (i am using qemu processes to run virtual
> > machines to merge the pages between them)
> >
> > ("SLQB debugging on by defaul" mean i dont have to add boot
> > pareameter right?)
> >
> > Maybe i should try update into newer version of the mm tree? (last
> > commit here is Jul 22)
>
> OK, bug on my side, just got that oppss, will try to fix and send
> patch.
>
> (Sorry for the noise)
>
> >
> >>
> >>> I'm finding, whether with your RFC or my tidyup, that kksmd
> >>> soon oopses in get_next_mmlist (or perhaps find_vma): presumably
> >>> accessing a vma or mm which already got freed (if you don't have
> >>> slab debugging on, it's liable to hang instead).
> >>>
> >>> (I've also not seen it actually merging yet: if you register
> >>> or madvise a large anon area and memset it, the /dev/ksm version
> >>> would merge all its pages, but I've not seen the madvise version
> >>> do so yet - though maybe there's something stupidly wrong in my
> >>> testing, really I'm more worried about the oopses at present.)
> >>>
> >>> Note that mmotm includes a patch of Nick's which adds a function
> >>> madvise_behavior_valid() - you'll need to add your MADVs into its
> >>> list to get it to work at all there.
> >>>
> >>> Here's a patch I added a month or so ago, when trying to
> >>> experiment with KSM on all mms: shouldn't be necessary if your mm
> >>> refcounting is right, but might help to avoid extra weirdness
> >>> when things go wrong: exit_mmap() leaves stale vma pointers
> >>> around, reckoning that nobody can be interested by now; but maybe
> >>> KSM might peep so better to tidy them up at least while
> >>> debugging...
> >>>
> >>> Thanks,
> >>> Hugh
> >>>
> >>> --- old/mm/mmap.c 2009-05-01 13:47:45.000000000 +0100
> >>> +++ new/mm/mmap.c 2009-05-03 11:34:47.000000000 +0100
> >>> @@ -2112,6 +2112,14 @@ void exit_mmap(struct mm_struct *mm)
> >>> tlb_finish_mmu(tlb, 0, end);
> >>>
> >>> /*
> >>> + * Make sure get_user_pages() and find_vma() etc. will find
> >>> nothing:
> >>> + * this may be necessary for KSM.
> >>> + */
> >>> + mm->mmap = NULL;
> >>> + mm->mmap_cache = NULL;
> >>> + mm->mm_rb = RB_ROOT;
> >>> +
> >>> + /*
> >>> * Walk the list again, actually closing and freeing it,
> >>> * with preemption enabled, without holding any MM locks.
> >>> */
> >>>
> >>
> >>
> >
> >
>
>
Ok, below is ugly fix for the opss..
>From 3be1ad5a9f990113e8849fa1e74c4e74066af131 Mon Sep 17 00:00:00 2001
From: Izik Eidus <ieidus@...hat.com>
Date: Mon, 15 Jun 2009 03:52:05 +0300
Subject: [PATCH] ksm: madvise-rfc: really ugly fix for the oppss bug.
This patch is just so it can run without to crush with the madvise rfc patch.
True fix for this i think is adding another list for ksm inside the mm struct.
In the meanwhile i will try to think about other way how to fix this bug.
Hugh, i hope at least now you will be able to run it without it will crush to
you.
Signed-off-by: Izik Eidus <ieidus@...hat.com>
---
kernel/fork.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index e5ef58c..771b89a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -484,17 +484,18 @@ void mmput(struct mm_struct *mm)
{
might_sleep();
+ spin_lock(&mmlist_lock);
if (atomic_dec_and_test(&mm->mm_users)) {
+ if (!list_empty(&mm->mmlist))
+ list_del(&mm->mmlist);
+ spin_unlock(&mmlist_lock);
exit_aio(mm);
exit_mmap(mm);
set_mm_exe_file(mm, NULL);
- if (!list_empty(&mm->mmlist)) {
- spin_lock(&mmlist_lock);
- list_del(&mm->mmlist);
- spin_unlock(&mmlist_lock);
- }
put_swap_token(mm);
mmdrop(mm);
+ } else {
+ spin_unlock(&mmlist_lock);
}
}
EXPORT_SYMBOL_GPL(mmput);
--
1.5.6.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists