[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1109201055470.8056@router.home>
Date: Tue, 20 Sep 2011 10:59:29 -0500 (CDT)
From: Christoph Lameter <cl@...two.org>
To: Steven Rostedt <rostedt@...dmis.org>
cc: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Christoph Lameter <cl@...ux.com>, Tejun Heo <htejun@...il.com>
Subject: Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
On Tue, 20 Sep 2011, Steven Rostedt wrote:
> On Tue, 2011-09-20 at 09:51 -0500, Christoph Lameter wrote:
>
> > I see that __this_cpu_xx operations may not work as intended in
> > preemptable contexts and there we could have more changes.
>
> Then why do you use it in slub.c?
>
> in mm/slab.c slab_alloc():
>
> redo:
>
> /*
> * Must read kmem_cache cpu data via this cpu ptr. Preemption is
> * enabled. We may switch back and forth between cpus while
> * reading from one cpu area. That does not matter as long
> * as we end up on the original cpu again when doing the cmpxchg.
> */
> c = __this_cpu_ptr(s->cpu_slab);
>
> The __this_cpu_*() is in preempt enabled location. In fact, the three
> this_cpu_write()'s in acquire_slab() is done within a spinlock and thus
> with preemption disabled.
So the this_cpu_writes could become __this_cpu_writes
The __this_cpu_ptr operation in slab_alloc is special. We explicitly do
address calculation and do not care from which per cpu area we fetch
multiple per cpu data items because we can verify that we are on the same
per cpu area during the this_cpu_cmpxchg_double operation.
> The point is, people get it wrong all the time. In fact, we should
> really require that ALL USES of this_cpu_*() must be with preemption
> disabled. Regardless. Because anytime you touch a per cpu variable,
NO!! This defeats the whole purpose of this_cpu_ops and make the whole
scheme utterly useless.
> Monkeying around with per cpu data is tricky. If you start doing it in
> preempt enabled code, you are most certainly about to get it wrong. Why
> have this super optimization. A preempt_disable() is a single operation
> that touches cache hot data.
There are trivial cases like counter increments that are not a problem at
all. Most use cases are those. More complex ones can be developed to avoid
various overhead in performance critical sections of the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists