[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40ca86a1-ea36-0185-1ba5-c69005e46d3f@colorfullife.com>
Date: Tue, 4 Jan 2022 21:48:54 +0100
From: Manfred Spraul <manfred@...orfullife.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Alexey Gladkov <legion@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux.dev>,
Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Christian Brauner <christian.brauner@...ntu.com>,
Daniel Walsh <dwalsh@...hat.com>,
Davidlohr Bueso <dbueso@...e.de>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
Serge Hallyn <serge@...lyn.com>,
Varad Gautam <varad.gautam@...e.com>,
Vasily Averin <vvs@...tuozzo.com>,
kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v2] ipc: Store mqueue sysctls in the ipc namespace
Hello Eric,
On 1/4/22 19:42, Eric W. Biederman wrote:
> Manfred Spraul <manfred@...orfullife.com> writes:
> Hi Alexey,
>> On 1/4/22 12:51, Alexey Gladkov wrote:
>>> Right now, the mqueue sysctls take ipc namespaces into account in a
>>> rather hacky way. This works in most cases, but does not respect the
>>> user namespace.
>>>
>>> Within the user namespace, the user cannot change the /proc/sys/fs/mqueue/*
>>> parametres. This poses a problem in the rootless containers.
>>>
>>> To solve this I changed the implementation of the mqueue sysctls just
>>> like some other sysctls.
>>>
>>> Before this change:
>>>
>>> $ echo 5 | unshare -r -U -i tee /proc/sys/fs/mqueue/msg_max
>>> tee: /proc/sys/fs/mqueue/msg_max: Permission denied
>>> 5
>> Could you crosscheck that all (relevant) allocations in ipc/mqueue.c
>> use GFP_KERNEL_ACCOUNT?
> They are not.
>
>> We should not allow normal users to use up all memory.
>>
>> Otherwise:
>> The idea is good, the limits do not really prevent using up all
>> memory, _ACCOUNT is the better approach.
>> And with _ACCOUNT, it doesn't hurt that the namespace root is able to
>> set limits.
> Saying the cgroup kernel memory limit is the only thing that works, and
> that is always better is silly.
>
>
> First the cgroup kernel memory limits noted with ACCOUNT are not
> acceptable on several kernel hot paths because they are so expensive.
I was not aware that ACCOUNT allocations are very expensive.
OTHO adding ACCOUNT resolved various out of memory crashes for IIRC
ipc/sem.c and/or ipc/msg.c. But we also do not have an RLIMIT for
ipc/sem.c or ipc/msg.c
Let me rephrase my question:
When we allow non-root users to write to /proc/sys/fs/mqueue/msg_max,
are there any _relevant_ allocations that bypass _all_ limits?
As you write, we have RLIMIT_MSGQUEUE.
And several allocations for ipc/mqueue already use ACCOUNT:
- the messages themselves, via load_msg()/alloc_msg().
- the inodes, via mqueue_inode_cachep().
> Further the memory cgroup kernel memory limit is not always delegated to
> non-root users, which precludes using the memory cgroup kernel memory
> limit in many situations.
>
>
> The RLIMIT_MQUEUE limit definitely works, and as I read the kernel
> source correct it defaults to MQ_BYTES_MAX aka 819200. A limit of
> 800KiB should prevent using up all of system memory, except on very low
> memory machines.
I'd agree that 800 kB is not relevant. But we need to be certain that
there are no loopholes.
I do not see anything relevant, e.g. 0-byte messages should be covered
by mq_maxmsg. But perhaps I overlook something.
> So please let's not confuse apples and oranges, and let's use the tools
> in the kernel where they work, and not set them up in contest with each
> other.
>
> Rlimits with generous but real limits in general are good at catching
> when a program misbehaves. The cgroups are better at setting a total
> memory cap. In this case the rlimit cap is low enough it simply should
> not matter.
>
> What has been fixed with the ucount rlimits is that (baring
> implementation bugs) it is now not possible to create a user namespace
> and escape your rlimits by using multiple users.
I'll try to check the patch in detail in the next few days.
--
Manfred
Powered by blists - more mailing lists