[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121025142457.GB308@x1.osrc.amd.com>
Date: Thu, 25 Oct 2012 16:24:57 +0200
From: Borislav Petkov <bp@...en8.de>
To: Dave Hansen <dave@...ux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] add some drop_caches documentation and info messsge
On Thu, Oct 25, 2012 at 04:57:11AM -0700, Dave Hansen wrote:
> On 10/25/2012 02:24 AM, Borislav Petkov wrote:
> > But let's discuss this a bit further. So, for the benchmarking aspect,
> > you're either going to have to always require dmesg along with
> > benchmarking results or /proc/vmstat, depending on where the drop_caches
> > stats end up.
> >
> > Is this how you envision it?
> >
> > And then there are the VM bug cases, where you might not always get
> > full dmesg from a panicked system. In that case, you'd want the kernel
> > tainting thing too, so that it at least appears in the oops backtrace.
> >
> > Although the tainting thing might not be enough - a user could
> > drop_caches at some point in time and the oops happening much later
> > could be unrelated but that can't be expressed in taint flags.
>
> Here's the problem: Joe Kernel Developer gets a bug report, usually
> something like "the kernel is slow", or "the kernel is eating up all my
> memory". We then start going and digging in to the problem with the
> usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common,
> but less likely, that we get things like vmstat along with such a bug
> report.
>
> Joe Kernel Developer digs in the statistics or the dmesg and tries to
> figure out what happened. I've run in to a couple of cases in practice
> (and I assume Michal has too) where the bug reporter was using
> drop_caches _heavily_ and did not realize the implications. It was
> quite hard to track down exactly how the page cache and dentries/inodes
> were getting purged.
>
> There are rarely oopses involved in these scenarios.
>
> The primary goal of this patch is to make debugging those scenarios
> easier so that we can quickly realize that drop_caches is the reason our
> caches went away, not some anomalous VM activity. A secondary goal is
> to tell the user: "Hey, maybe this isn't something you want to be doing
> all the time."
Ok, understood. So you will be requiring dmesg, ok, then it makes sense.
This way you're also getting timestamps of when exactly and how many
times drop_caches was used. For that, though, you'll need to add the
timestamp explicitly to the printk because CONFIG_PRINTK_TIME is not
always enabled.
Thanks.
--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists