linux-kernel - Re: [RFC PATCH 04/11] mm/mempolicy: modify get_mempolicy call stack to take a task argument

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZWX0IMfR3S3rRzen@tiehlicka>
Date:   Tue, 28 Nov 2023 15:07:28 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Gregory Price <gourry.memverge@...il.com>
Cc:     linux-mm@...ck.org, linux-doc@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
        linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
        akpm@...ux-foundation.org, arnd@...db.de, tglx@...utronix.de,
        luto@...nel.org, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
        tj@...nel.org, ying.huang@...el.com,
        Gregory Price <gregory.price@...verge.com>
Subject: Re: [RFC PATCH 04/11] mm/mempolicy: modify get_mempolicy call stack
 to take a task argument

On Wed 22-11-23 16:11:53, Gregory Price wrote:
[...]
> @@ -928,7 +929,16 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  		 * vma/shared policy at addr is NULL.  We
>  		 * want to return MPOL_DEFAULT in this case.
>  		 */
> -		mm = current->mm;
> +		if (task == current) {
> +			/*
> +			 * original behavior allows a kernel task changing its
> +			 * own policy to avoid the condition in get_task_mm,
> +			 * so we'll directly access
> +			 */
> +			mm = task->mm;
> +			mmget(mm);

Do we actually have any kernel thread that would call this? Does it
actually make sense to support?

> +		} else
> +			mm = get_task_mm(task);
>  		mmap_read_lock(mm);
>  		vma = vma_lookup(mm, addr);
>  		if (!vma) {
> @@ -947,8 +957,10 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  		return -EINVAL;
>  	else {
>  		/* take a reference of the task policy now */
> -		pol = current->mempolicy;
> +		task_lock(task);
> +		pol = task->mempolicy;
>  		mpol_get(pol);
> +		task_unlock(task);
>  	}
>  
>  	if (!pol) {
> @@ -962,12 +974,13 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  			vma = NULL;
>  			mmap_read_unlock(mm);
>  			err = lookup_node(mm, addr);
> +			mmput(mm);
>  			if (err < 0)
>  				goto out;
>  			*policy = err;
> -		} else if (pol == current->mempolicy &&
> +		} else if (pol == task->mempolicy &&
>  				pol->mode == MPOL_INTERLEAVE) {
> -			*policy = next_node_in(current->il_prev, pol->nodes);
> +			*policy = next_node_in(task->il_prev, pol->nodes);

This is racy without task_lock which I do not think is helde but it also
seems this is not a big deal. pol is ref. counted so it won't go away
and if the task->mempolicy changes then the return value could be bogus
but this seems acceptable. It would be good to put a comment here that
this is actually deliberate.

-- 
Michal Hocko
SUSE Labs