linux-kernel - Re: [-mm] Add an owner to the mm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <47F79102.6090406@linux.vnet.ibm.com>
Date:	Sat, 05 Apr 2008 20:17:30 +0530
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	Paul Menage <menage@...gle.com>
CC:	Pavel Emelianov <xemul@...nvz.org>,
	Hugh Dickins <hugh@...itas.com>,
	Sudhir Kumar <skumar@...ux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@...inux.co.jp>, lizf@...fujitsu.com,
	linux-kernel@...r.kernel.org, taka@...inux.co.jp,
	linux-mm@...ck.org, David Rientjes <rientjes@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [-mm] Add an owner to the mm_struct (v8)

Paul Menage wrote:
> On Fri, Apr 4, 2008 at 2:25 AM, Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
>>  >>  For other controllers,
>>  >>  they'll need to monitor exit() callbacks to know when the leader is dead :( (sigh).
>>  >
>>  > That sounds like a nightmare ...
>>  >
>>
>>  Yes, it would be, but worth the trouble. Is it really critical to move a dead
>>  cgroup leader to init_css_set in cgroup_exit()?
> 
> It struck me that this whole group leader optimization is broken as it
> stands since there could (in strange configurations) be multiple
> thread groups sharing the same mm.
> 
> I wonder if we can't just delay the exit_mm() call of a group leader
> until all its threads have exited?
> 

Not sure about this one, I suspect keeping the group_leader around is an
optimization, changing exit_mm() for the group_leader, not sure how that will
impact functionality or standards. It might even break some applications.

Repeating my question earlier

Can we delay setting task->cgroups = &init_css_set for the group_leader, until
all threads have exited? If the user is unable to remove a cgroup node, it will
be due a valid reason, the group_leader is still around, since the threads are
still around. The user in that case should wait for notify_on_release.

>>  > As long as we find someone to pass the mm to quickly, it shouldn't be
>>  > too bad - I think we're already optimized for that case. Generally the
>>  > group leader's first child will be the new owner, and any subsequent
>>  > times the owner exits, they're unlikely to have any children so
>>  > they'll go straight to the sibling check and pass the mm to the
>>  > parent's first child.
>>  >
>>  > Unless they all exit in strict sibling order and hence pass the mm
>>  > along the chain one by one, we should be fine. And if that exit
>>  > ordering does turn out to be common, then simply walking the child and
>>  > sibling lists in reverse order to find a victim will minimize the
>>  > amount of passing.
>>  >
>>
>>
>>  Finding the next mm might not be all that bad, but doing it each time a task
>>  exits, can be an overhead, specially for large multi threaded programs.
> 
> Right, but we only have that overhead if we actually end up passing
> the mm from one to another each time they exit. It would be
> interesting to know what order the threads in a large multi-threaded
> process exit typically (when the main process exits and all the
> threads die).
> 
> I guess it's likely to be one of:
> 
> - in thread creation order (i.e. in order of parent->children list),
> in which case we should try to throw the mm to the parent's last child
> - in reverse creation order, in which case we should try to throw the
> mm to the parent's first child
> - in random order depending on which threads the scheduler runs first
> (in which case we can expect that a small fraction of the threads will
> have to throw the mm whichever end we start from)
> 
>>  This can
>>  get severe if the new mm->owner belongs to a different cgroup, in which case we
>>  need to use callbacks as well.
>>
>>  If half the threads belonged to a different cgroup and the new mm->owner kept
>>  switching between cgroups, the overhead would be really high, with the callbacks
>>  and the mm->owner changing frequently.
> 
> To me, it seems that setting up a *virtual address space* cgroup
> hierarchy and then putting half your threads in one group and half in
> the another is asking for trouble. We need to not break in that
> situation, but I'm not sure it's a case to optimize for.

That could potentially happen, if the virtual address space cgroup and cpu
control cgroup were bound together in the same hierarchy by the sysadmin.

I measured the overhead of removing the delay_group_leader optimization and
found a 4% impact on throughput (with volanomark, that is one of the
multi-threaded benchmarks I know of).

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/