[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9RjRxe5Ao2/u+1Y@P9FQF9L96D.corp.robot.car>
Date: Fri, 27 Jan 2023 15:50:31 -0800
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Leonardo Brás <leobras@...hat.com>
Cc: Michal Hocko <mhocko@...e.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Shakeel Butt <shakeelb@...gle.com>,
Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
cgroups@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining
On Fri, Jan 27, 2023 at 04:29:37PM -0300, Leonardo Brás wrote:
> On Fri, 2023-01-27 at 10:29 +0100, Michal Hocko wrote:
> > On Fri 27-01-23 04:35:22, Leonardo Brás wrote:
> > > On Fri, 2023-01-27 at 08:20 +0100, Michal Hocko wrote:
> > > > On Fri 27-01-23 04:14:19, Leonardo Brás wrote:
> > > > > On Thu, 2023-01-26 at 15:12 -0800, Roman Gushchin wrote:
> > > > [...]
> > > > > > I'd rather opt out of stock draining for isolated cpus: it might slightly reduce
> > > > > > the accuracy of memory limits and slightly increase the memory footprint (all
> > > > > > those dying memcgs...), but the impact will be limited. Actually it is limited
> > > > > > by the number of cpus.
> > > > >
> > > > > I was discussing this same idea with Marcelo yesterday morning.
> > > > >
> > > > > The questions had in the topic were:
> > > > > a - About how many pages the pcp cache will hold before draining them itself?
> > > >
> > > > MEMCG_CHARGE_BATCH (64 currently). And one more clarification. The cache
> > > > doesn't really hold any pages. It is a mere counter of how many charges
> > > > have been accounted for the memcg page counter. So it is not really
> > > > consuming proportional amount of resources. It just pins the
> > > > corresponding memcg. Have a look at consume_stock and refill_stock
> > >
> > > I see. Thanks for pointing that out!
> > >
> > > So in worst case scenario the memcg would have reserved 64 pages * (numcpus - 1)
> >
> > s@...cpus@..._isolated_cpus@
>
> I was thinking worst case scenario being (ncpus - 1) being isolated.
>
> >
> > > that are not getting used, and may cause an 'earlier' OOM if this amount is
> > > needed but can't be freed.
> >
> > s@OOM@...cg OOM@
>
> > > In the wave of worst case, supposing a big powerpc machine, 256 CPUs, each
> > > holding 64k * 64 pages => 1GB memory - 4MB (one cpu using resources).
> > > It's starting to get too big, but still ok for a machine this size.
> >
> > It is more about the memcg limit rather than the size of the machine.
> > Again, let's focus on actual usacase. What is the usual memcg setup with
> > those isolcpus
>
> I understand it's about the limit, not actually allocated memory. When I point
> the machine size, I mean what is expected to be acceptable from a user in that
> machine.
>
> >
> > > The thing is that it can present an odd behavior:
> > > You have a cgroup created before, now empty, and try to run given application,
> > > and hits OOM.
> >
> > The application would either consume those cached charges or flush them
> > if it is running in a different memcg. Or what do you have in mind?
>
> 1 - Create a memcg with a VM inside, multiple vcpus pinned to isolated cpus.
> 2 - Run multi-cpu task inside the VM, it allocates memory for every CPU and keep
> the pcp cache
> 3 - Try to run a single-cpu task (pinned?) inside the VM, which uses almost all
> the available memory.
> 4 - memcg OOM.
>
> Does it make sense?
It can happen now as well, you just need a competing drain request.
Honestly, I feel the probability of this scenario to be a real problem is fairly low.
I don't recall any complains on spurious OOMs because of races in the draining code.
Usually machines which are tight on memory are rarely have so many idle cpus.
Thanks!
Powered by blists - more mailing lists