[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1410020154500.6444@eggly.anvils>
Date: Thu, 2 Oct 2014 02:23:08 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Sasha Levin <sasha.levin@...cle.com>
cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, hughd@...gle.com,
mgorman@...e.de
Subject: Re: [PATCH 0/5] mm: poison critical mm/ structs
On Wed, 1 Oct 2014, Sasha Levin wrote:
> On 10/01/2014 05:07 PM, Andrew Morton wrote:
> > On Mon, 29 Sep 2014 21:47:14 -0400 Sasha Levin <sasha.levin@...cle.com> wrote:
> >
> >> Currently we're seeing a few issues which are unexplainable by looking at the
> >> data we see and are most likely caused by a memory corruption caused
> >> elsewhere.
> >>
> >> This is wasting time for folks who are trying to figure out an issue provided
> >> a stack trace that can't really point out the real issue.
> >>
> >> This patch introduces poisoning on struct page, vm_area_struct, and mm_struct,
> >> and places checks in busy paths to catch corruption early.
> >>
> >> This series was tested, and it detects corruption in vm_area_struct. Right now
> >> I'm working on figuring out the source of the corruption, (which is a long
> >> standing bug) using KASan, but the current code is useful as it is.
> >
> > Is this still useful if/when kasan is in place?
>
> Yes, the corruption we're seeing happens inside the struct rather than around it.
> kasan doesn't look there.
>
> When kasan is merged, we could complement this patchset by making kasan trap on
> when the poison is getting written, rather than triggering a BUG in some place
> else after we saw the corruption.
>
> > It looks fairly cheap - I wonder if it should simply fall under
> > CONFIG_DEBUG_VM rather than the new CONFIG_DEBUG_VM_POISON.
>
> Config options are cheap as well :)
>
> I'd rather expand it further and add poison/kasan trapping into other places such
> as the vma interval tree rather than having to keep it "cheap".
I like to run with CONFIG_DEBUG_VM, and would not want this stuff
turned on in my builds (especially not the struct page enlargement);
so I'm certainly with you in preferring a separate option.
But it all seems very ad hoc to me. Are people going to be adding
more and more mm structures into it, ad infinitum? And adding
CONFIG_DEBUG_SCHED_POISON one day when someone notices corruption
of a scheduler structure? etc etc.
What does this add on top of slab poisoning? Some checks in some
mm places while the object is active, I guess: why not base those
on slab poisoning? And add them in as appropriate to the problem
at hand, when a problem is seen.
I think these patches are fine for investigating whatever is the
problem currently afflicting you and mm under trinity; but we all
have our temporary debugging patches, I don't think all deserve
preservation in everyone else's kernel, that amounts to far more
clutter than any are worth.
I'm glad to hear they've confirmed some vm_area_struct corruption:
any ideas on where that's coming from?
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists