linux-kernel - Re: [PATCH] kernel: automatically split user namespace extent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ft4pe7km.fsf@x220.int.ebiederm.org>
Date:   Tue, 01 Dec 2020 11:53:45 -0600
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Giuseppe Scrivano <gscrivan@...hat.com>
Cc:     linux-kernel@...r.kernel.org, christian.brauner@...ntu.com,
        serge@...lyn.com,
        Linux Containers <containers@...ts.linux-foundation.org>
Subject: Re: [PATCH] kernel: automatically split user namespace extent


Nit: The tag should have been "userns:" rather than kernel.

Giuseppe Scrivano <gscrivan@...hat.com> writes:

> writing to the id map fails when an extent overlaps multiple mappings
> in the parent user namespace, e.g.:
>
> $ cat /proc/self/uid_map
>          0       1000          1
>          1     100000      65536
> $ unshare -U sleep 100 &
> [1] 1029703
> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> 0 0 100
> tee: /proc/1029703/uid_map: Operation not permitted
>
> To prevent it from happening, automatically split an extent so that
> each portion fits in one extent in the parent user namespace.

I don't see anything fundamentally wrong with relaxing this
restriction, but more code does have more room for bugs to hide.

What is the advantage of relaxing this restriction?

> $ cat /proc/self/uid_map
>          0       1000          1
>          1     110000      65536
> $ unshare -U sleep 100 &
> [1] 1552
> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> 0 0 100
> $ cat /proc/$!/uid_map
>          0          0          1
>          1          1         99
>
> Signed-off-by: Giuseppe Scrivano <gscrivan@...hat.com>
> ---
>  kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
>  1 file changed, 52 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 87804e0371fe..b5542be2bd0a 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
>  	.show = projid_m_show,
>  };
>  
> +static void split_overlapping_mappings(struct uid_gid_map *parent_map,
> +				       struct uid_gid_extent *extent,
> +				       struct uid_gid_extent *overflow_extent)
> +{
> +	unsigned int idx;
> +
> +	overflow_extent->first = (u32) -1;
> +
> +	/* Split extent if it not fully contained in an extent from parent_map.  */
> +	for (idx = 0; idx < parent_map->nr_extents; idx++) {

Ouch!

For the larger tree we perform binary searches typically and
here you are walking every entry unconditionally.

It looks like this makes the write O(N^2) from O(NlogN)
which for a user facing function is not desirable.

I think something like insert_and_split_extent may be ok.
Incorporating your loop and the part that inserts an element.

As written this almost doubles the complexity of the code,
as well as making it perform much worse.  Which is a problem.


> +		struct uid_gid_extent *prev;
> +		u32 first, last, prev_last, size;
> +
> +		if (parent_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> +			prev = &parent_map->extent[idx];
> +		else
> +			prev = &parent_map->forward[idx];
> +
> +		first = extent->lower_first;
> +		last = extent->lower_first + extent->count - 1;
> +		prev_last = prev->first + prev->count - 1;
> +
> +		if ((first <= prev_last) && (last > prev_last)) {
> +			size = prev_last - first + 1;
> +
> +			overflow_extent->first = extent->first + size;
> +			overflow_extent->lower_first = extent->lower_first + size;
> +			overflow_extent->count = extent->count - size;
> +
> +			extent->count = size;
> +			return;
> +		}
> +	}
> +}
> +
>  static bool mappings_overlap(struct uid_gid_map *new_map,
>  			     struct uid_gid_extent *extent)
>  {
> @@ -852,6 +887,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  	struct uid_gid_map new_map;
>  	unsigned idx;
>  	struct uid_gid_extent extent;
> +	struct uid_gid_extent overflow_extent;
>  	char *kbuf = NULL, *pos, *next_line;
>  	ssize_t ret;
>  
> @@ -946,18 +982,24 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  		     extent.lower_first)
>  			goto out;
>  
> -		/* Do the ranges in extent overlap any previous extents? */
> -		if (mappings_overlap(&new_map, &extent))
> -			goto out;
> +		do {
> +			/* Do the ranges in extent overlap any previous extents? */
> +			if (mappings_overlap(&new_map, &extent))
> +				goto out;

Why should mappings_overlap be called in the loop?   Will splitting an
extent create the possibility for creating overlapping mappings?

> -		if ((new_map.nr_extents + 1) == UID_GID_MAP_MAX_EXTENTS &&
> -		    (next_line != NULL))
> -			goto out;
> +			if ((new_map.nr_extents + 1) == UID_GID_MAP_MAX_EXTENTS &&
> +			    (next_line != NULL))
> +				goto out;
>  
> -		ret = insert_extent(&new_map, &extent);
> -		if (ret < 0)
> -			goto out;
> -		ret = -EINVAL;
> +			split_overlapping_mappings(parent_map, &extent, &overflow_extent);
> +
> +			ret = insert_extent(&new_map, &extent);
> +			if (ret < 0)
> +				goto out;
> +			ret = -EINVAL;
> +
> +			extent = overflow_extent;
> +		} while (overflow_extent.first != (u32) -1);
>  	}
>  	/* Be very certaint the new map actually exists */
>  	if (new_map.nr_extents == 0)

Eric