lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZWizUEd/rsxSc0fW@memverge.com>
Date:   Thu, 30 Nov 2023 11:07:44 -0500
From:   Gregory Price <gregory.price@...verge.com>
To:     Zhongkun He <hezhongkun.hzk@...edance.com>
Cc:     Vinicius Petrucci <vpetrucci@...il.com>, akpm@...ux-foundation.org,
        linux-mm@...r.kernel.org, linux-cxl@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-api@...r.kernel.org, minchan@...nel.org,
        dave.hansen@...ux.intel.com, x86@...nel.org,
        Jonathan.Cameron@...wei.com, aneesh.kumar@...ux.ibm.com,
        ying.huang@...el.com, dan.j.williams@...el.com, fvdl@...gle.com,
        surenb@...gle.com, rientjes@...gle.com, hannes@...xchg.org,
        mhocko@...e.com, Hasan.Maruf@....com, jgroves@...ron.com,
        ravis.opensrc@...ron.com, sthanneeru@...ron.com,
        emirakhur@...ron.com, vtavarespetr@...ron.com
Subject: Re: [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for
 external memory binding

On Thu, Nov 30, 2023 at 05:34:04PM +0800, Zhongkun He wrote:
> Hi Gregory, sorry for the late reply.
> 
> I tried pidfd_set_mempolicy(suggested by michal) about a year ago.
> There is a problem here that may need attention.
> 
> A mempolicy can be either associated with a process or with a VMA.
> All vma manipulation is somewhat protected by a down_read on
> mmap_lock.In process context(in alloc_pages()) there is no locking
> because only the process accesses its own state.
> 
> Now  we need to change the process context mempolicy specified
> in pidfd. the mempolicy may about to be freed by
> pidfd_set_mempolicy() while alloc_pages() is using it,
> The race condition appears.
> 
> Say something like the following:
> 
> pidfd_set_mempolicy()        target task stack:
>                                                alloc_pages:
>                                              mpol = p->mempolicy;
>   task_lock(task);
>   old = task->mempolicy;
>   task->mempolicy = new;
>   task_unlock(task);
>   mpol_put(old);
>                                            /*old mpol has been freed.*/
>                                            policy_node(...., mpol)
>                                           __alloc_pages();
> 
> To reduce the use of locks and atomic operations(mpol_get/put)
> in the hot path, there are no references or lock protections here
> for task mempolicy.
> 
> It would be great if your refactoring has a good solution.
> 
> Thanks.
> 

Hi ZhongKun!

I actually just sent out a more general RFC to mempolicy updates that
discuss this more completely:

https://lore.kernel.org/linux-mm/ZWezcQk+BYEq%2FWiI@memverge.com/

and another post on even more issues with pidfd modifications to vma
mempolicies:

https://lore.kernel.org/linux-mm/ZWYsth2CtC4Ilvoz@memverge.com/

We may have to slow-walk the changes to vma policies due to there being
many more hidden accesses to (current) than expected. It's a rather
nasty rats nest of mempolicy-vma-cpusets-shmem callbacks that obscure
these current-task accesses, it will take time to work through.

As for hot-path reference counting - we may need to change the way
mempolicy is managed, possibly we could leverage RCU to manage mempolicy
references in the hot path, rather than using locks.  In this scenario,
we would likely need to change the way the default policy is applied
(maybe not, I haven't fully explored it).

Do you have thoughts on this?  Would very much like additional comments
before I go through the refactor work.

Regards,
Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ