linux-kernel - Re: [PATCH v2 26/33] Task fork and exit for rdtgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <57D95E7D.2090608@intel.com>
Date:   Wed, 14 Sep 2016 07:28:13 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <h.peter.anvin@...el.com>,
        Ingo Molnar <mingo@...e.hu>,
        Peter Zijlstra <peterz@...radead.org>,
        Tejun Heo <tj@...nel.org>, Borislav Petkov <bp@...e.de>,
        Stephane Eranian <eranian@...gle.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        David Carrillo-Cisneros <davidcc@...gle.com>,
        Shaohua Li <shli@...com>,
        Ravi V Shankar <ravi.v.shankar@...el.com>,
        Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        Sai Prakhya <sai.praneeth.prakhya@...el.com>,
        linux-kernel <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>
Subject: Re: [PATCH v2 26/33] Task fork and exit for rdtgroup

On 09/13/2016 04:35 PM, Luck, Tony wrote:
> On Tue, Sep 13, 2016 at 04:13:04PM -0700, Dave Hansen wrote:
>> Yikes, is this a new global lock and possible atomic_inc() on a shared
>> variable in the fork() path?  Has there been any performance or
>> scalability testing done on this code?
>>
>> That mutex could be a disaster for fork() once the filesystem is
>> mounted.  Even if it goes away, if you have a large number of processes
>> in an rdtgroup and they are forking a lot, you're bound to see the
>> rdtgrp->refcount get bounced around a lot.
> 
> The mutex is (almost certainly) going away.

Oh, cool.  That's good to know.

> The atomic_inc()
> is likely staying (but only applies to tasks that are in 
> resource groups other than the default one.  But on a system
> where we partition the cache between containers/VMs, that may
> essentially be all processes.

Yeah, that's what worries me.  We had/have quite a few regressions from
when something runs inside vs. outside of certain cgroups.  We
definitely don't want to be adding more of those.

> We only really use the refcount to decide whether the group
> can be removed ... since that is the rare operation, perhaps
> we could put all the work there and have it count them with:
> 
> 	n = 0;
> 	rcu_read_lock();
> 	for_each_process(p)
> 		if (p->rdtgroup == this_rdtgroup)
> 			n++;
> 	rcu_read_unlock();
> 	if (n != 0)
> 		return -EBUSY;

Yeah, that seems sane.  I'm sure it can be optimized even more than
that, but that at least gets it out of the fast path.