lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZV6Uhsg6WLBtNqU3@memverge.com>
Date:   Wed, 22 Nov 2023 18:53:42 -0500
From:   Gregory Price <gregory.price@...verge.com>
To:     Vinicius Petrucci <vpetrucci@...il.com>
Cc:     akpm@...ux-foundation.org, linux-mm@...r.kernel.org,
        linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-arch@...r.kernel.org, linux-api@...r.kernel.org,
        minchan@...nel.org, dave.hansen@...ux.intel.com, x86@...nel.org,
        Jonathan.Cameron@...wei.com, aneesh.kumar@...ux.ibm.com,
        ying.huang@...el.com, dan.j.williams@...el.com,
        hezhongkun.hzk@...edance.com, fvdl@...gle.com, surenb@...gle.com,
        rientjes@...gle.com, hannes@...xchg.org, mhocko@...e.com,
        Hasan.Maruf@....com, jgroves@...ron.com, ravis.opensrc@...ron.com,
        sthanneeru@...ron.com, emirakhur@...ron.com,
        vtavarespetr@...ron.com
Subject: Re: [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for
 external memory binding

On Wed, Nov 22, 2023 at 03:31:05PM -0600, Vinicius Petrucci wrote:
> From: Vinicius Tavares Petrucci <vtavarespetr@...ron.com>
> 
> The proposed API is as follows:
> 
> long process_mbind(int pidfd, 
>                 const struct iovec *iovec, 
>                 unsigned long vlen, 
>                 unsigned long mode, 
>                 const unsigned long *nmask,
>                 unsigned int flags);
> 
> The `pidfd` argument is used to select the process that is identified by the PID file 
> descriptor provided in pidfd. (See pidofd_open(2) for more information)
>

This is probably a more maintainable interface than the pid_t interface
in my RFC, but I have not used pidfd extensively myself.

I can pivot my RFC to utilize pidfd as a whole and pop this on top, if
the other interfaces are generally useful.

> Please note the initial `maxnode` parameter from `mbind` was omitted 
> to ensure the API doesn't exceed 6 arguments. Instead, the constant 
> MAX_NUMNODES was utilized.
> 

I don't think this will work, users have traditionally been allowed to
shorten their nodemasks, and also for some level of portability.

We may want to consider an arg structure, rather than just chopping an
argument off.

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 10a590ee1c89..91ee300fa728 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1215,11 +1215,10 @@ static struct folio *alloc_migration_target_by_mpol(struct folio *src,
>  }
>  #endif
>  
> -static long do_mbind(unsigned long start, unsigned long len,
> +static long do_mbind(struct mm_struct *mm, unsigned long start, unsigned long len,
>  		     unsigned short mode, unsigned short mode_flags,
>  		     nodemask_t *nmask, unsigned long flags)
>  {
> -	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma, *prev;
>  	struct vma_iterator vmi;
>  	struct migration_mpol mmpol;
> @@ -1465,10 +1464,84 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
>  	return 0;
>  }

This is a completely insufficient change to do_mbind.  do_mbind utilizes
`current` in a variety of places for nodemask (cpuset) validation and to
acquire the task's lock.  This will not work the way you intend it to,
you end up mixing up node masks between current and target task.

see here:
https://lore.kernel.org/all/20231122211200.31620-7-gregory.price@memverge.com/

I had to make do_mbind operate on the target task explicitly.

We may want to combine this change and with my change so that your iovec
changes can be re-used, because that is a very nice feature.

> +	unsigned long maxnode = MAX_NUMNODES;
> +	int err;
> +	nodemask_t nodes;
> +
> +	err = sanitize_mpol_flags(&lmode, &mode_flags);
> +	if (err)
> +		goto out;
> +
> +	err = get_nodes(&nodes, nmask, maxnode);

per above, userland MAX_NUMNODES may or may not be equal to kernel
MAX_NUMNODES, so i don't think you can just chop this argument off.


I think we can combine our RFCs here and get this sorted out.

~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ