lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y3IqLzvduM6HqPJV@dhcp22.suse.cz>
Date:   Mon, 14 Nov 2022 12:44:47 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Zhongkun He <hezhongkun.hzk@...edance.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, corbet@....net,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [External] Re: [PATCH v2] mm: add new syscall
 pidfd_set_mempolicy().

On Mon 14-11-22 00:41:21, Zhongkun He wrote:
> Hi Andrew, thanks for your replay.
> 
> > This sounds a bit suspicious.  Please share much more detail about
> > these races.  If we proced with this design then mpol_put_async()
> > shouild have comments which fully describe the need for the async free.
> > 
> > How do we *know* that these races are fully prevented with this
> > approach?  How do we know that mpol_put_async() won't free the data
> > until the race window has fully passed?
> 
> A mempolicy can be either associated with a process or with a VMA.
> All vma manipulation is somewhat protected by a down_read on
> mmap_lock.In process context there is no locking because only
> the process accesses its own state before.

We shouldn't really rely on mmap_sem for this IMO. There is alloc_lock
(aka task lock) that makes sure the policy is stable so that caller can
atomically take a reference and hold on the policy. And we do not do
that consistently and this should be fixed. E.g. just looking at some
random places like allowed_mems_nr (relying on get_task_policy) is
completely lockless and some paths (like fadvise) do not use any of the
explicit (alloc_lock) or implicit (mmap_lock) locking. That means that
the task_work based approach cannot really work in this case, right?

Playing more tricks will not really help long term. So while your patch
tries to workaround the current state of the art I do not think we
really want that. As stated previously, I would much rather see proper
reference counting instead. I thought we have agreed this would be the
first approach unless the resulting performance is really bad. Have you
concluded that to be the case? I do not see any numbers or notes in the
changelog.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ