linux-kernel - Re: [RFC][PATCH] ipc: Remove IPCMNI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3e201de2-bed2-6f7d-0783-700d095142e0@colorfullife.com>
Date:   Thu, 29 Mar 2018 10:47:45 +0200
From:   Manfred Spraul <manfred@...orfullife.com>
To:     Davidlohr Bueso <dave@...olabs.net>,
        Waiman Long <longman@...hat.com>,
        Michael Kerrisk <mtk.manpages@...il.com>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        "Luis R. Rodriguez" <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Matthew Wilcox <willy@...radead.org>,
        Stanislav Kinsbursky <skinsbursky@...allels.com>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        linux-api@...r.kernel.org
Subject: Re: [RFC][PATCH] ipc: Remove IPCMNI

Hello together,

On 03/29/2018 04:14 AM, Davidlohr Bueso wrote:
> Cc'ing mtk, Manfred and linux-api.
>
> See below.
>
> On Thu, 15 Mar 2018, Waiman Long wrote:
>
>> On 03/15/2018 03:00 PM, Eric W. Biederman wrote:
>>> Waiman Long <longman@...hat.com> writes:
>>>
>>>> On 03/14/2018 08:49 PM, Eric W. Biederman wrote:
>>>>> The define IPCMNI was originally the size of a statically sized 
>>>>> array in
>>>>> the kernel and that has long since been removed. Therefore there 
>>>>> is no
>>>>> fundamental reason for IPCMNI.
>>>>>
>>>>> The only remaining use IPCMNI serves is as a convoluted way to format
>>>>> the ipc id to userspace.  It does not appear that anything except for
>>>>> the CHECKPOINT_RESTORE code even cares about this variety of 
>>>>> assignment
>>>>> and the CHECKPOINT_RESTORE code only cares about this weirdness 
>>>>> because
>>>>> it has to restore these peculiar ids.
>>>>>
My assumption is that if an array is recreated, it should get a 
different id.
     a=semget(1234,,);
     semctl(a,,IPC_RMID);
     b=semget(1234,,);
now a!=b.

Rational: semop() calls only refer to the array by the id.
If there is a stale process in the system that tries to access the "old" 
array and the new array has the same id, then the locking gets corrupted.
>>>>> Therefore make the assignment of ipc ids match the description in
>>>>> Advanced Programming in the Unix Environment and assign the next id
>>>>> until INT_MAX is hit then loop around to the lower ids.
>>>>>
Ok, sounds good.
That way we really cycle through INT_MAX, right now a==b would happen 
after 128k RMID calls.
>>>>> This can be implemented trivially with the current code using 
>>>>> idr_alloc_cyclic.
>>>>>
Is there a performance impact?
Right now, the idr tree is only large if there are lots of objects.
What happens if we have only 1 object, with id=INT_MAX-1?

semop() that do not sleep are fairly fast.
The same applies for msgsnd/msgrcv, if the message is small enough.

@Davidlohr:
Do you know if there are application that frequently call semop() and it 
doesn't have to sleep?
 From the scalability that was pushed into the kernel, I assume that 
this exists.

I have myself only checked postgresql, and postgresql always sleeps.
(and this was long ago)
>>>>> To make it possible to keep checkpoint/restore working I have renamed
>>>>> the sysctls from xxx_next_id to xxx_nextid.  That is enough change 
>>>>> that
>>>>> a smart CRIU implementation can see that what is exported has 
>>>>> changed,
>>>>> and act accordingly.  New kernels will be able to restore the old 
>>>>> id's.
>>>>>
>>>>> This code still needs some real world testing to verify my 
>>>>> assumptions.
>>>>> And some work with the CRIU implementations to actually add the code
>>>>> that deals with the new for of id assignment.
>>>>>
It means that all existing checkpoint/restore application will not work 
with a new kernel.
Everyone must first update the checkpoint/restore application, then 
update the kernel.

Is this acceptable?

--
     Manfred