lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Nov 2017 13:36:29 +0000
From:   Mel Gorman <>
To:     Michal Hocko <>
Cc:,,, Steven Rostedt <>,
        Ingo Molnar <>,
        Alex Deucher <>,
        "David S . Miller" <>,
        Harry Wentland <>,
        Greg Kroah-Hartman <>,
        Tony Cheng <>,
        Andrew Morton <>,
        Vlastimil Babka <>,
        Johannes Weiner <>,
        Pavel Tatashin <>
Subject: Re: [PATCH] Add slowpath enter/exit trace events

On Thu, Nov 23, 2017 at 01:25:30PM +0100, Michal Hocko wrote:
> On Thu 23-11-17 11:43:36, wrote:
> > From: Peter Enderborg <>
> > 
> > The warning of slow allocation has been removed, this is
> > a other way to fetch that information. But you need
> > to enable the trace. The exit function also returns
> > information about the number of retries, how long
> > it was stalled and failure reason if that happened.
> I think this is just too excessive. We already have a tracepoint for the
> allocation exit. All we need is an entry to have a base to compare with.
> Another usecase would be to measure allocation latency. Information you
> are adding can be (partially) covered by existing tracepoints.

You can gather that by simply adding a probe to __alloc_pages_slowpath
(like what perf probe does) and matching the trigger with the existing
mm_page_alloc points. This is a bit approximate because you would need
to filter mm_page_alloc hits that do not have a corresponding hit with
__alloc_pages_slowpath but that is easy.

With that probe, it's trivial to use systemtap to track the latencies between
those points on a per-processes basis and then only do a dump_stack from
systemtap for the ones that are above a particular threshold. This can all
be done without introducing state-tracking code into the page allocator
that is active regardless of whether the tracepoint is in use. It also
has the benefit of working with many older kernels.

If systemtap is not an option then use ftrace directly to gather the
information from userspace. It can be done via trace_pipe with some overhead
or on a per-cpu basis like what trace-cmd does. It's important to note
that even *if* the tracepoints were introduced that it would be necessary
to have something gather the information and report it in a sensible fashion.

That probe+mm_page_alloc can tell you the frequency of allocation
attempts that take a long time but not the why. Compaction and direct
reclaim latencies can be checked via existing tracepoints and in the case
of compaction, detailed information can also be gathered from existing
tracepoints. Detailed information on why direct reclaim stalled can be
harder but the biggest one is checking if reclaim stalls due to congestion
and again, tracepoints already exist for that.

I'm not convinced that a new tracepoint is needed.

Mel Gorman

Powered by blists - more mailing lists