linux-kernel - Re: [External] Re: [PATCH v2] mm: add new syscall pidfd_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y3T6SqZvAmSG5I6W@dhcp22.suse.cz>
Date:   Wed, 16 Nov 2022 15:57:14 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Zhongkun He <hezhongkun.hzk@...edance.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, corbet@....net,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [External] Re: [PATCH v2] mm: add new syscall
 pidfd_set_mempolicy().

On Wed 16-11-22 19:28:10, Zhongkun He wrote:
> Hi Michal, I've done the performance testing, please check it out.
> 
> > > Yes this is all understood but the level of the overhead is not really
> > > clear. So the question is whether this will induce a visible overhead.
> > > Because from the maintainability point of view it is much less costly to
> > > have a clear life time model. Right now we have a mix of reference
> > > counting and per-task requirements which is rather subtle and easy to
> > > get wrong. In an ideal world we would have get_vma_policy always
> > > returning a reference counted policy or NULL. If we really need to
> > > optimize for cache line bouncing we can go with per cpu reference
> > > counters (something that was not available at the time the mempolicy
> > > code has been introduced).
> > > 
> > > So I am not saying that the task_work based solution is not possible I
> > > just think that this looks like a good opportunity to get from the
> > > existing subtle model.
> 
> Test tools:
> numactl -m 0-3 ./run-mmtests.sh -n -c configs/config-workload-
> aim9-pagealloc  test_name
> 
> Modification:
> Get_vma_policy(), get_task_policy() always returning a reference
> counted policy, except for the static policy(default_policy and
> preferred_node_policy[nid]).

It would be better to add the patch that has been tested.

> All vma manipulation is protected by a down_read, so mpol_get()
> can be called directly to take a refcount on the mpol. but there
> is no lock in task->mempolicy context.
> so task->mempolicy should be protected by task_lock.
> 
> struct mempolicy *get_task_policy(struct task_struct *p)
> {
> 	struct mempolicy *pol;
> 	int node;
> 
> 	if (p->mempolicy) {
> 		task_lock(p);
> 		pol = p->mempolicy;
> 		mpol_get(pol);
> 		task_unlock(p);
> 		if (pol)
> 			return pol;
> 	}


One way to deal with that would be to use a similar model as css_tryget

Btw. have you tried to profile those slowdowns to identify hotspots?

Thanks
-- 
Michal Hocko
SUSE Labs