lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOtvUMeMsd0Jk1k4wP9Y+7NW3FYZZAqV1-cRj5Zt4+eaugWoPg@mail.gmail.com>
Date:	Mon, 26 Sep 2011 11:35:21 +0300
From:	Gilad Ben-Yossef <gilad@...yossef.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	linux-kernel@...r.kernel.org,
	Frederic Weisbecker <fweisbec@...il.com>,
	Russell King <linux@....linux.org.uk>,
	Chris Metcalf <cmetcalf@...era.com>, linux-mm@...ck.org,
	Christoph Lameter <cl@...ux-foundation.org>,
	Pekka Enberg <penberg@...nel.org>,
	Matt Mackall <mpm@...enic.com>
Subject: Re: [PATCH 5/5] slub: Only IPI CPUs that have per cpu obj to flush

On Mon, Sep 26, 2011 at 10:33 AM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> On Sun, 2011-09-25 at 11:54 +0300, Gilad Ben-Yossef wrote:
>> +       if (likely(zalloc_cpumask_var(&cpus, GFP_ATOMIC))) {
>> +               for_each_online_cpu(cpu) {
>> +                       c = per_cpu_ptr(s->cpu_slab, cpu);
>> +                       if (c && c->page)
>> +                               cpumask_set_cpu(cpu, cpus);
>> +               }
>> +               on_each_cpu_mask(cpus, flush_cpu_slab, s, 1);
>> +               free_cpumask_var(cpus);
>
> Right, having to do that for_each_oneline_cpu() loop only to then IPI
> them can cause a massive cacheline bounce fest.. Ideally you'd want to
> keep a cpumask per kmem_cache, although I bet the memory overhead of
> that isn't attractive.
>
> Also, what Pekka says, having that alloc here isn't good either.

Yes, the alloc in the flush_all path definitively needs to go. I
wonder if just to resolve that allocating the mask per cpu and not in
kmem_cache itself is not better - after all, all we need is a single
mask per cpu when we wish to do a flush_all and no per cache. The
memory overhead of that is slightly better. This doesn't cover the
cahce bounce issue.

My thoughts regarding that were that since the flush_all() was a
rather rare operation it is preferable to do some more
work/interference here, if it allows us to avoid having to do more
work in the hotter alloc/dealloc paths, especially since it allows us
to have less IPIs that I figured are more intrusive then cacheline
steals (are they?)

After all, for each CPU that actually needs to do a flush, we are
making the flush a bit more expensive because of the cache bounce just
before we send the IPI, but that IPI and further operations are an
expensive operations anyway. For CPUs that don't need to do a flush, I
replaced an IPI for a cacheline(s) steal. I figured it was still a
good bargain

I will spin a new patch that moves this to kmem_cache if you believe
this is the right way to go.

Thanks!
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@...yossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ