[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240409160241.GC1057805@cmpxchg.org>
Date: Tue, 9 Apr 2024 12:02:41 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Tejun Heo <tj@...nel.org>
Cc: Michal Koutný <mkoutny@...e.com>,
cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Zefan Li <lizefan.x@...edance.com>,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>
Subject: Re: [RFC PATCH v3 2/9] cgroup/pids: Separate semantics of
pids.events related to pids.max
On Mon, Apr 08, 2024 at 07:55:38AM -1000, Tejun Heo wrote:
> Hello,
>
> On Fri, Apr 05, 2024 at 07:05:41PM +0200, Michal Koutný wrote:
> > Currently, when pids.max limit is breached in the hierarchy, the event
> > is counted and reported in the cgroup where the forking task resides.
> >
> > This decouples the limit and the notification caused by the limit making
> > it hard to detect when the actual limit was effected.
> >
> > Let's introduce new events:
> > max
> > The number of times the limit of the cgroup was hit.
> >
> > max.imposed
> > The number of times fork failed in the cgroup because of self
> > or ancestor limit.
>
> The whole series make sense to me. I'm not sure about max.imposed field
> name. Maybe a name which clearly signfies rejection of forks would be
> clearer? Johannes, what do you think?
The max event at the level where the limit is set (and up, for
hierarchical accounting) makes sense to me.
max.imposed is conceptually not entirely unprecedented, but something
we've tried to avoid. Usually the idea is that events correspond to
specific cgroup limitations at that level. Failures due to constraints
higher up could be from anything, including system-level shortages.
IOW, events are supposed to be more about "how many times did this
limit here trigger", and less about "how many times did something
happen to the tasks local to this group".
It's a bit arbitrary and not perfectly followed everywhere, but I
think there is value in trying to maintain that distinction, so that
somebody looking at those files doesn't have to rack their brains or
look up every counter in the docs to figure out what it's tracking.
It's at least true for the misc controller, and for most of memcg -
with the weird exception of the swap.max events which we've tried to
fix before...
For "things that are happening to the tasks in this group", would it
make more sense to have an e.g. pids.stat::forkfail instead?
(Or just not have that event at all? I'm not sure if it's actually
needed or whether you kept it only to maintain some form of the
information that is currently provided by the pr_info()).
Powered by blists - more mailing lists