linux-kernel - Re: [PATCH v7 03/11] ceph: handle idmapped mounts in create_request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a441f975-86c0-0f95-5ad0-ad6139a2d705@redhat.com>
Date:   Fri, 28 Jul 2023 18:12:07 +0800
From:   Xiubo Li <xiubli@...hat.com>
To:     Stéphane Graber <stgraber@...ntu.com>,
        Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
Cc:     Christian Brauner <brauner@...nel.org>,
        linux-fsdevel@...r.kernel.org, Jeff Layton <jlayton@...nel.org>,
        Ilya Dryomov <idryomov@...il.com>, ceph-devel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 03/11] ceph: handle idmapped mounts in
 create_request_message()


On 7/27/23 22:46, Stéphane Graber wrote:
> On Thu, Jul 27, 2023 at 5:48 AM Aleksandr Mikhalitsyn
> <aleksandr.mikhalitsyn@...onical.com> wrote:
>> On Thu, Jul 27, 2023 at 11:01 AM Christian Brauner <brauner@...nel.org> wrote:
>>> On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote:
>>>> On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li <xiubli@...hat.com> wrote:
>>>>>
>>>>> On 7/26/23 22:10, Alexander Mikhalitsyn wrote:
>>>>>> Inode operations that create a new filesystem object such as ->mknod,
>>>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
>>>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
>>>>>> filesystem object.
>>>>>>
>>>>>> In order to ensure that the correct {g,u}id is used map the caller's
>>>>>> fs{g,u}id for creation requests. This doesn't require complex changes.
>>>>>> It suffices to pass in the relevant idmapping recorded in the request
>>>>>> message. If this request message was triggered from an inode operation
>>>>>> that creates filesystem objects it will have passed down the relevant
>>>>>> idmaping. If this is a request message that was triggered from an inode
>>>>>> operation that doens't need to take idmappings into account the initial
>>>>>> idmapping is passed down which is an identity mapping.
>>>>>>
>>>>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
>>>>>> which adds two new fields (owner_{u,g}id) to the request head structure.
>>>>>> So, we need to ensure that MDS supports it otherwise we need to fail
>>>>>> any IO that comes through an idmapped mount because we can't process it
>>>>>> in a proper way. MDS server without such an extension will use caller_{u,g}id
>>>>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
>>>>>> values are unmapped. At the same time we can't map these fields with an
>>>>>> idmapping as it can break UID/GID-based permission checks logic on the
>>>>>> MDS side. This problem was described with a lot of details at [1], [2].
>>>>>>
>>>>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
>>>>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>>>>>>
>>>>>> Cc: Xiubo Li <xiubli@...hat.com>
>>>>>> Cc: Jeff Layton <jlayton@...nel.org>
>>>>>> Cc: Ilya Dryomov <idryomov@...il.com>
>>>>>> Cc: ceph-devel@...r.kernel.org
>>>>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
>>>>>> Signed-off-by: Christian Brauner <christian.brauner@...ntu.com>
>>>>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
>>>>>> ---
>>>>>> v7:
>>>>>>        - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
>>>>>> ---
>>>>>>    fs/ceph/mds_client.c         | 20 ++++++++++++++++++++
>>>>>>    fs/ceph/mds_client.h         |  5 ++++-
>>>>>>    include/linux/ceph/ceph_fs.h |  4 +++-
>>>>>>    3 files changed, 27 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>>> index c641ab046e98..ac095a95f3d0 100644
>>>>>> --- a/fs/ceph/mds_client.c
>>>>>> +++ b/fs/ceph/mds_client.c
>>>>>> @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>>>    {
>>>>>>        int mds = session->s_mds;
>>>>>>        struct ceph_mds_client *mdsc = session->s_mdsc;
>>>>>> +     struct ceph_client *cl = mdsc->fsc->client;
>>>>>>        struct ceph_msg *msg;
>>>>>>        struct ceph_mds_request_head_legacy *lhead;
>>>>>>        const char *path1 = NULL;
>>>>>> @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>>>        lhead = find_legacy_request_head(msg->front.iov_base,
>>>>>>                                         session->s_con.peer_features);
>>>>>>
>>>>>> +     if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
>>>>>> +         !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
>>>>>> +             pr_err_ratelimited_client(cl,
>>>>>> +                     "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
>>>>>> +                     " is not supported by MDS. Fail request with -EIO.\n");
>>>>>> +
>>>>>> +             ret = -EIO;
>>>>>> +             goto out_err;
>>>>>> +     }
>>>>>> +
>>>>> I think this couldn't fail the mounting operation, right ?
>>>> This won't fail mounting. First of all an idmapped mount is always an
>>>> additional mount, you always
>>>> start from doing "normal" mount and only after that you can use this
>>>> mount to create an idmapped one.
>>>> ( example: https://github.com/brauner/mount-idmapped/tree/master )
>>>>
>>>>> IMO we should fail the mounting from the beginning.
>>>> Unfortunately, we can't fail mount from the beginning. Procedure of
>>>> the idmapped mounts
>>>> creation is handled not on the filesystem level, but on the VFS level
>>> Correct. It's a generic vfsmount feature.
>>>
>>>> (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277
>>>> )
>>>>
>>>> Kernel perform all required checks as:
>>>> - filesystem type has declared to support idmappings
>>>> (fs_type->fs_flags & FS_ALLOW_IDMAP)
>>>> - user who creates idmapped mount should be CAP_SYS_ADMIN in a user
>>>> namespace that owns superblock of the filesystem
>>>> (for cephfs it's always init_user_ns => user should be root on the host)
>>>>
>>>> So I would like to go this way because of the reasons mentioned above:
>>>> - root user is someone who understands what he does.
>>>> - idmapped mounts are never "first" mounts. They are always created
>>>> after "normal" mount.
>>>> - effectively this check makes "normal" mount to work normally and
>>>> fail only requests that comes through an idmapped mounts
>>>> with reasonable error message. Obviously, all read operations will
>>>> work perfectly well only the operations that create new inodes will
>>>> fail.
>>>> Btw, we already have an analogical semantic on the VFS level for users
>>>> who have no UID/GID mapping to the host. Filesystem requests for
>>>> such users will fail with -EOVERFLOW. Here we have something close.
>>> Refusing requests coming from an idmapped mount if the server misses
>>> appropriate features is good enough as a first step imho. And yes, we do
>>> have similar logic on the vfs level for unmapped uid/gid.
>> Thanks, Christian!
>>
>> I wanted to add that alternative here is to modify caller_{u,g}id
>> fields as it was done in the first approach,
>> it will break the UID/GID-based permissions model for old MDS versions
>> (we can put printk_once to inform user about this),
>> but at the same time it will allow us to support idmapped mounts in
>> all cases. This support will be not fully ideal for old MDS
>>   and perfectly well for new MDS versions.
>>
>> Alternatively, we can introduce cephfs mount option like
>> "idmap_with_old_mds" and if it's enabled then we set caller_{u,g}id
>> for MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID, if it's disabled
>> (default) we fail requests with -EIO. For
>> new MDS everything goes in the right way.
>>
>> Kind regards,
>> Alex
> Hey there,
>
> A very strong +1 on there needing to be some way to make this work
> with older Ceph releases.
> Ceph Reef isn't out yet and we're in July 2023, so I'd really like not
> having to wait until Ceph Squid in mid 2024 to be able to make use of
> this!

IMO this shouldn't be an issue, because we can backport it to old releases.

Thanks

- Xiubo

>
> Some kind of mount option, module option or the like would all be fine for this.
>
> Stéphane
>