[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 11 Nov 2022 11:27:32 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Zhongkun He <hezhongkun.hzk@...edance.com>
Cc: corbet@....net, mhocko@...e.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: Re: [PATCH v2] mm: add new syscall pidfd_set_mempolicy().
On Fri, 11 Nov 2022 16:40:51 +0800 Zhongkun He <hezhongkun.hzk@...edance.com> wrote:
> Page allocation usage of task or vma policy occurs in the fault
> path where we hold the mmap_lock for read. because replacing the
> task or vma policy requires that the mmap_lock be held for write,
> the policy can't be freed out from under us while we're using
> it for page allocation. But there are some corner cases(e.g.
> alloc_pages()) which not acquire any lock for read during the
> page allocation. For this reason, task_work is used in
> mpol_put_async() to free mempolicy in pidfd_set_mempolicy().
> Thuse, it avoids into race conditions.
This sounds a bit suspicious. Please share much more detail about
these races. If we proced with this design then mpol_put_async()
shouild have comments which fully describe the need for the async free.
How do we *know* that these races are fully prevented with this
approach? How do we know that mpol_put_async() won't free the data
until the race window has fully passed?
Also, in some situations mpol_put_async() will free the data
synchronously anyway, so aren't these races still present?
Secondly, why was the `flags' argument added? We might use it one day?
For what purpose? I mean, every syscall could have a does-nothing
`flags' arg, but we don't do that. What's the plan here?
Powered by blists - more mailing lists