linux-kernel - Re: [PATCH 1/3] Add a new field to struct shrinker

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160728102513.GA2799@techsingularity.net>
Date:	Thu, 28 Jul 2016 11:25:13 +0100
From:	Mel Gorman <mgorman@...hsingularity.net>
To:	Dave Chinner <david@...morbit.com>
Cc:	Tony Jones <tonyj@...e.de>, Michal Hocko <mhocko@...e.cz>,
	Janani Ravichandran <janani.rvchndrn@...il.com>,
	Rik van Riel <riel@...riel.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	hannes@...xchg.org, vdavydov@...tuozzo.com, vbabka@...e.cz,
	kirill.shutemov@...ux.intel.com, bywxiaobai@....com
Subject: Re: [PATCH 1/3] Add a new field to struct shrinker

On Thu, Jul 28, 2016 at 03:49:47PM +1000, Dave Chinner wrote:
> Seems you're all missing the obvious.
> 
> Add a tracepoint for a shrinker callback that includes a "name"
> field, have the shrinker callback fill it out appropriately. e.g
> in the superblock shrinker:
> 
> 	trace_shrinker_callback(shrinker, shrink_control, sb->s_type->name);
> 

That misses capturing the latency of the call unless there is a begin/end
tracepoint. I was aware of the function graph tracer but I don't know how
to convince that to give the following information;

1. The length of time spent in a given function
2. The tracepoint information that might explain why the stall occurred

Take the compaction tracepoint for example

        trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
                                cc->free_pfn, end_pfn, sync);

	...

	trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
                                cc->free_pfn, end_pfn, sync, ret);

The function graph tracer can say that X time is compact_zone() but it
cannot distinguish between a short time spent in that function because
compaction_suitable == false or compaction simply finished quickly.  While
the cc struct parameters could be extracted, end_pfn is much harder to figure
out because a user would have to parse zoneinfo to figure it out and even
*that* would only work if there are no overlapping nodes. Extracting sync
would require making assumptions about the implementation of compact_zone()
that could change.

> And now you know exactly what shrinker is being run.
> 

Sure and it's a good suggestion but does not say how long the shrinker
was running.

My understanding was the point of the tracepoints was to get detailed
information on points where the kernel is known to stall for long periods
of time. I don't actually know how to convince the function graph tracer
to get that type of information. Maybe it's possible and I just haven't
tried recently enough.

It potentially duration could be inferred from using a return probe on
the function but that requires that the function the tracepoint is running
is is known by the tool, has not been inlined and that there are no retry
loops that hit the begin tracepoint.

-- 
Mel Gorman
SUSE Labs