[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50892917.30201@linux.vnet.ibm.com>
Date: Thu, 25 Oct 2012 04:57:11 -0700
From: Dave Hansen <dave@...ux.vnet.ibm.com>
To: Borislav Petkov <bp@...en8.de>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] add some drop_caches documentation and info messsge
On 10/25/2012 02:24 AM, Borislav Petkov wrote:
> But let's discuss this a bit further. So, for the benchmarking aspect,
> you're either going to have to always require dmesg along with
> benchmarking results or /proc/vmstat, depending on where the drop_caches
> stats end up.
>
> Is this how you envision it?
>
> And then there are the VM bug cases, where you might not always get
> full dmesg from a panicked system. In that case, you'd want the kernel
> tainting thing too, so that it at least appears in the oops backtrace.
>
> Although the tainting thing might not be enough - a user could
> drop_caches at some point in time and the oops happening much later
> could be unrelated but that can't be expressed in taint flags.
Here's the problem: Joe Kernel Developer gets a bug report, usually
something like "the kernel is slow", or "the kernel is eating up all my
memory". We then start going and digging in to the problem with the
usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common,
but less likely, that we get things like vmstat along with such a bug
report.
Joe Kernel Developer digs in the statistics or the dmesg and tries to
figure out what happened. I've run in to a couple of cases in practice
(and I assume Michal has too) where the bug reporter was using
drop_caches _heavily_ and did not realize the implications. It was
quite hard to track down exactly how the page cache and dentries/inodes
were getting purged.
There are rarely oopses involved in these scenarios.
The primary goal of this patch is to make debugging those scenarios
easier so that we can quickly realize that drop_caches is the reason our
caches went away, not some anomalous VM activity. A secondary goal is
to tell the user: "Hey, maybe this isn't something you want to be doing
all the time."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists