linux-kernel - Re: [PATCH] add some drop_caches documentation and info messsge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121025142457.GB308@x1.osrc.amd.com>
Date:	Thu, 25 Oct 2012 16:24:57 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Dave Hansen <dave@...ux.vnet.ibm.com>
Cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] add some drop_caches documentation and info messsge

On Thu, Oct 25, 2012 at 04:57:11AM -0700, Dave Hansen wrote:
> On 10/25/2012 02:24 AM, Borislav Petkov wrote:
> > But let's discuss this a bit further. So, for the benchmarking aspect,
> > you're either going to have to always require dmesg along with
> > benchmarking results or /proc/vmstat, depending on where the drop_caches
> > stats end up.
> > 
> > Is this how you envision it?
> > 
> > And then there are the VM bug cases, where you might not always get
> > full dmesg from a panicked system. In that case, you'd want the kernel
> > tainting thing too, so that it at least appears in the oops backtrace.
> > 
> > Although the tainting thing might not be enough - a user could
> > drop_caches at some point in time and the oops happening much later
> > could be unrelated but that can't be expressed in taint flags.
> 
> Here's the problem: Joe Kernel Developer gets a bug report, usually
> something like "the kernel is slow", or "the kernel is eating up all my
> memory".  We then start going and digging in to the problem with the
> usual tools.  We almost *ALWAYS* get dmesg, and it's reasonably common,
> but less likely, that we get things like vmstat along with such a bug
> report.
> 
> Joe Kernel Developer digs in the statistics or the dmesg and tries to
> figure out what happened.  I've run in to a couple of cases in practice
> (and I assume Michal has too) where the bug reporter was using
> drop_caches _heavily_ and did not realize the implications.  It was
> quite hard to track down exactly how the page cache and dentries/inodes
> were getting purged.
> 
> There are rarely oopses involved in these scenarios.
> 
> The primary goal of this patch is to make debugging those scenarios
> easier so that we can quickly realize that drop_caches is the reason our
> caches went away, not some anomalous VM activity.  A secondary goal is
> to tell the user: "Hey, maybe this isn't something you want to be doing
> all the time."

Ok, understood. So you will be requiring dmesg, ok, then it makes sense.

This way you're also getting timestamps of when exactly and how many
times drop_caches was used. For that, though, you'll need to add the
timestamp explicitly to the printk because CONFIG_PRINTK_TIME is not
always enabled.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/