[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46c1c59e-1368-620d-e57a-f35c2c82084d@linux.dev>
Date: Mon, 11 Apr 2022 12:40:29 +0300
From: Vasily Averin <vasily.averin@...ux.dev>
To: Shakeel Butt <shakeelb@...gle.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Vlastimil Babka <vbabka@...e.cz>, NeilBrown <neilb@...e.de>,
Michal Hocko <mhocko@...e.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Linux MM <linux-mm@...ck.org>, netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Tejun Heo <tj@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Eric Dumazet <edumazet@...gle.com>,
Kees Cook <keescook@...omium.org>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
David Ahern <dsahern@...nel.org>, linux-kernel@...r.kernel.org,
kernel@...nvz.org, Luis Chamberlain <mcgrof@...nel.org>
Subject: problem with accounting of allocations called from __net_init hooks
On 3/1/22 21:09, Shakeel Butt wrote:
> On Mon, Feb 28, 2022 at 06:36:58AM -0800, Luis Chamberlain wrote:
>> On Mon, Feb 28, 2022 at 10:17:16AM +0300, Vasily Averin wrote:
>> > Following one-liner running inside memcg-limited container consumes
>> > huge number of host memory and can trigger global OOM.
>> >
>> > for i in `seq 1 xxx` ; do ip l a v$i type veth peer name vp$i ; done
>> >
>> > Patch accounts most part of these allocations and can protect host.
>> > ---[cut]---
>> > It is not polished, and perhaps should be splitted.
>> > obviously it affects other kind of netdevices too.
>> > Unfortunately I'm not sure that I will have enough time to handle it properly
>> > and decided to publish current patch version as is.
>> > OpenVz workaround it by using per-container limit for number of
>> > available netdevices, but upstream does not have any kind of
>> > per-container configuration.
>> > ------
I've noticed that __register_pernet_operations() executes init hook of registered
pernet_operation structure in all found net namespaces.
Usually these hooks are called by process related to specified net namespace,
and all marked allocation are accounted to related container:
i.e. objects related to netns in container A are accounted to memcg of container A,
objects allocated inside container B are accounted to corresponding memcg B,
and so on.
However __register_pernet_operations() calls the same hooks in one context,
and as result all marked allocations are accounted to one memcg.
It is quite rare scenario, however current processing looks incorrect for me.
I expect we can take memcg from 'struct net', because of this structure is accounted per se.
then we can use set_active_memcg() before init hook execution.
However I'm not sure it is fully correct.
Could you please advise some better solution?
Thank you,
Vasily Averin
Powered by blists - more mailing lists