[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALOAHbBkQbw49T=22zdiK9BzEvy7fEmCmhhJh3mTkm3JvjsD_g@mail.gmail.com>
Date: Fri, 17 Jul 2020 09:43:48 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Shakeel Butt <shakeelb@...gle.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux MM <linux-mm@...ck.org>,
"open list:BLOCK LAYER" <linux-block@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 0/2] psi: enhance psi with the help of ebpf
On Fri, Jul 17, 2020 at 1:04 AM Shakeel Butt <shakeelb@...gle.com> wrote:
>
> On Wed, Jul 15, 2020 at 8:19 PM Yafang Shao <laoar.shao@...il.com> wrote:
> >
> > On Thu, Jul 16, 2020 at 12:36 AM Shakeel Butt <shakeelb@...gle.com> wrote:
> > >
> > > Hi Yafang,
> > >
> > > On Tue, Mar 31, 2020 at 3:05 AM Yafang Shao <laoar.shao@...il.com> wrote:
> > > >
> > > > PSI gives us a powerful way to anaylze memory pressure issue, but we can
> > > > make it more powerful with the help of tracepoint, kprobe, ebpf and etc.
> > > > Especially with ebpf we can flexiblely get more details of the memory
> > > > pressure.
> > > >
> > > > In orderc to achieve this goal, a new parameter is added into
> > > > psi_memstall_{enter, leave}, which indicates the specific type of a
> > > > memstall. There're totally ten memstalls by now,
> > > > MEMSTALL_KSWAPD
> > > > MEMSTALL_RECLAIM_DIRECT
> > > > MEMSTALL_RECLAIM_MEMCG
> > > > MEMSTALL_RECLAIM_HIGH
> > > > MEMSTALL_KCOMPACTD
> > > > MEMSTALL_COMPACT
> > > > MEMSTALL_WORKINGSET_REFAULT
> > > > MEMSTALL_WORKINGSET_THRASH
> > > > MEMSTALL_MEMDELAY
> > > > MEMSTALL_SWAPIO
> > > > With the help of kprobe or tracepoint to trace this newly added agument we
> > > > can know which type of memstall it is and then do corresponding
> > > > improvement. I can also help us to analyze the latency spike caused by
> > > > memory pressure.
> > > >
> > > > But note that we can't use it to build memory pressure for a specific type
> > > > of memstall, e.g. memcg pressure, compaction pressure and etc, because it
> > > > doesn't implement various types of task->in_memstall, e.g.
> > > > task->in_memcgstall, task->in_compactionstall and etc.
> > > >
> > > > Although there're already some tracepoints can help us to achieve this
> > > > goal, e.g.
> > > > vmscan:mm_vmscan_kswapd_{wake, sleep}
> > > > vmscan:mm_vmscan_direct_reclaim_{begin, end}
> > > > vmscan:mm_vmscan_memcg_reclaim_{begin, end}
> > > > /* no tracepoint for memcg high reclaim*/
> > > > compcation:mm_compaction_kcompactd_{wake, sleep}
> > > > compcation:mm_compaction_begin_{begin, end}
> > > > /* no tracepoint for workingset refault */
> > > > /* no tracepoint for workingset thrashing */
> > > > /* no tracepoint for use memdelay */
> > > > /* no tracepoint for swapio */
> > > > but psi_memstall_{enter, leave} gives us a unified entrance for all
> > > > types of memstall and we don't need to add many begin and end tracepoints
> > > > that hasn't been implemented yet.
> > > >
> > > > Patch #2 gives us an example of how to use it with ebpf. With the help of
> > > > ebpf we can trace a specific task, application, container and etc. It also
> > > > can help us to analyze the spread of latencies and whether they were
> > > > clustered at a point of time or spread out over long periods of time.
> > > >
> > > > To summarize, with the pressure data in /proc/pressure/memroy we know that
> > > > the system is under memory pressure, and then with the newly added tracing
> > > > facility in this patchset we can get the reason of this memory pressure,
> > > > and then thinks about how to make the change.
> > > > The workflow can be illustrated as bellow.
> > > >
> > > > REASON ACTION
> > > > | compcation | improve compcation |
> > > > | vmscan | improve vmscan |
> > > > Memory pressure -| workingset | improve workingset |
> > > > | etc | ... |
> > > >
> > >
> > > I have not looked at the patch series in detail but I wanted to get
> > > your thoughts if it is possible to achieve what I am trying to do with
> > > this patch series.
> > >
> > > At the moment I am only interested in global reclaim and I wanted to
> > > enable alerts like "alert if there is process stuck in global reclaim
> > > for x seconds in last y seconds window" or "alert if all the processes
> > > are stuck in global reclaim for some z seconds".
> > >
> > > I see that using this series I can identify global reclaim but I am
> > > wondering if alert or notifications are possible. Android is using psi
> > > monitors for such alerts but it does not use cgroups, so, most of the
> > > memstalls are related to global reclaim stall. For cgroup environment,
> > > do we need for add support to psi monitor similar to this patch
> > > series?
> > >
> >
> > Hi Shakeel,
> >
> > We use the PSI tracepoints in our kernel to analyze the individual
> > latency caused by memory pressure, but the PSI tracepoints are
> > implemented with a new version as bellow:
> > trace_psi_memstall_enter(_RET_IP_);
> > trace_psi_memstall_leave(_RET_IP_);
> > And then using the _RET_IP_ to identify the specific PSI type.
> >
> > If the _RET_IP_ is at try_to_free_mem_cgroup_pages(), then it means
> > the pressure caused by the memory cgroup, IOW, the limit of memcg is
> > reached and it has to do memcg reclaim. Otherwise we can consider it
> > as global memory pressure.
> > try_to_free_mem_cgroup_pages
> > psi_memstall_enter
> > if (static_branch_likely(&psi_disabled))
> > return;
> > *flags = current->in_memstall;
> > if (*flags)
> > return;
> > trace_psi_memstall_enter(_RET_IP_); <<<<< memcg pressure
> >
>
> Thanks for the response. I am looking for 'always on' monitoring. More
> specifically defining the system level SLIs based on PSI. My concern
> with ftrace is its global shared state and also it is not really for
> 'always on' monitoring. You have mentioned ebpf. Is ebpf fine for
> 'always on' monitoring and is it possible to notify user space by ebpf
> on specific conditions (e.g. a process stuck in global reclaim for 60
> seconds)?
>
ebpf is fine for 'always on' monitoring from my experience, but I'm
not sure whether it is possible to notify user space on specific
conditions.
Notifying user space would be a useful feature, so I think we can have a try.
--
Thanks
Yafang
Powered by blists - more mailing lists