[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEZ6=UNLuOYom1Qng28F2y6XocJM4cnbDG1yq3m1p8btuQ4tRQ@mail.gmail.com>
Date: Thu, 23 Nov 2023 09:21:08 -0600
From: Vinicius Petrucci <vpetrucci@...il.com>
To: Gregory Price <gregory.price@...verge.com>, linux-mm@...ck.org
Cc: akpm@...ux-foundation.org, linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-api@...r.kernel.org, minchan@...nel.org,
dave.hansen@...ux.intel.com, x86@...nel.org,
Jonathan.Cameron@...wei.com, aneesh.kumar@...ux.ibm.com,
ying.huang@...el.com, dan.j.williams@...el.com,
hezhongkun.hzk@...edance.com, fvdl@...gle.com, surenb@...gle.com,
rientjes@...gle.com, hannes@...xchg.org, mhocko@...e.com,
Hasan.Maruf@....com, jgroves@...ron.com, ravis.opensrc@...ron.com,
sthanneeru@...ron.com, emirakhur@...ron.com,
vtavarespetr@...ron.com
Subject: Re: [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for
external memory binding
Hi Greg!
Thanks a lot for quickly looking into this and sharing your notes here.
On Wed, Nov 22, 2023 at 5:53 PM Gregory Price
<gregory.price@...verge.com> wrote:
>
> > Please note the initial `maxnode` parameter from `mbind` was omitted
> > to ensure the API doesn't exceed 6 arguments. Instead, the constant
> > MAX_NUMNODES was utilized.
> >
>
> I don't think this will work, users have traditionally been allowed to
> shorten their nodemasks, and also for some level of portability.
>
> We may want to consider an arg structure, rather than just chopping an
> argument off.
>
Yes, good point... that should be considered as a more complete
long-term approach beyond the MVP.
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 10a590ee1c89..91ee300fa728 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1215,11 +1215,10 @@ static struct folio *alloc_migration_target_by_mpol(struct folio *src,
> > }
> > #endif
> >
> > -static long do_mbind(unsigned long start, unsigned long len,
> > +static long do_mbind(struct mm_struct *mm, unsigned long start, unsigned long len,
> > unsigned short mode, unsigned short mode_flags,
> > nodemask_t *nmask, unsigned long flags)
> > {
> > - struct mm_struct *mm = current->mm;
> > struct vm_area_struct *vma, *prev;
> > struct vma_iterator vmi;
> > struct migration_mpol mmpol;
> > @@ -1465,10 +1464,84 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
> > return 0;
> > }
>
> This is a completely insufficient change to do_mbind. do_mbind utilizes
> `current` in a variety of places for nodemask (cpuset) validation and to
> acquire the task's lock. This will not work the way you intend it to,
> you end up mixing up node masks between current and target task.
>
Oh oh. True! Good catch!
> see here:
> https://lore.kernel.org/all/20231122211200.31620-7-gregory.price@memverge.com/
>
Let me go over this... Thanks!
> We may want to combine this change and with my change so that your iovec
> changes can be re-used, because that is a very nice feature.
>
Sounds good. Thanks again!
Best,
Vinicius
Powered by blists - more mailing lists