lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 04 Jan 2022 12:42:26 -0600
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Manfred Spraul <manfred@...orfullife.com>
Cc:     Alexey Gladkov <legion@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Containers <containers@...ts.linux.dev>,
        Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Daniel Walsh <dwalsh@...hat.com>,
        Davidlohr Bueso <dbueso@...e.de>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Serge Hallyn <serge@...lyn.com>,
        Varad Gautam <varad.gautam@...e.com>,
        Vasily Averin <vvs@...tuozzo.com>,
        kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v2] ipc: Store mqueue sysctls in the ipc namespace


Manfred Spraul <manfred@...orfullife.com> writes:
>
 Hi Alexey,
>
> On 1/4/22 12:51, Alexey Gladkov wrote:
>> Right now, the mqueue sysctls take ipc namespaces into account in a
>> rather hacky way. This works in most cases, but does not respect the
>> user namespace.
>>
>> Within the user namespace, the user cannot change the /proc/sys/fs/mqueue/*
>> parametres. This poses a problem in the rootless containers.
>>
>> To solve this I changed the implementation of the mqueue sysctls just
>> like some other sysctls.
>>
>> Before this change:
>>
>> $ echo 5 | unshare -r -U -i tee /proc/sys/fs/mqueue/msg_max
>> tee: /proc/sys/fs/mqueue/msg_max: Permission denied
>> 5
>
> Could you crosscheck that all (relevant) allocations in ipc/mqueue.c
> use GFP_KERNEL_ACCOUNT?

They are not.

> We should not allow normal users to use up all memory.
>
> Otherwise:
> The idea is good, the limits do not really prevent using up all
> memory, _ACCOUNT is the better approach.
> And with _ACCOUNT, it doesn't hurt that the namespace root is able to
> set limits.

Saying the cgroup kernel memory limit is the only thing that works, and
that is always better is silly.


First the cgroup kernel memory limits noted with ACCOUNT are not
acceptable on several kernel hot paths because they are so expensive.

Further the memory cgroup kernel memory limit is not always delegated to
non-root users, which precludes using the memory cgroup kernel memory
limit in many situations.


The RLIMIT_MQUEUE limit definitely works, and as I read the kernel
source correct it defaults to MQ_BYTES_MAX aka 819200.  A limit of
800KiB should prevent using up all of system memory, except on very low
memory machines.


So please let's not confuse apples and oranges, and let's use the tools
in the kernel where they work, and not set them up in contest with each
other.

Rlimits with generous but real limits in general are good at catching
when a program misbehaves.  The cgroups are better at setting a total
memory cap.  In this case the rlimit cap is low enough it simply should
not matter.

What has been fixed with the ucount rlimits is that (baring
implementation bugs) it is now not possible to create a user namespace
and escape your rlimits by using multiple users.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ