[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.02.1109201334590.2723@ionos>
Date: Tue, 20 Sep 2011 15:49:26 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Christoph Lameter <cl@...two.org>
cc: Steven Rostedt <rostedt@...dmis.org>,
LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Christoph Lameter <cl@...ux.com>, Tejun Heo <htejun@...il.com>
Subject: Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
On Mon, 19 Sep 2011, Christoph Lameter wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
>
> > From: Steven Rostedt <srostedt@...hat.com>
> >
> > The code in mod_state() is already made to handle the raciness of
> > this_cpu_read(). Have the code use __this_cpu_read() instead so
> > the debug code does not trigger warnings about it.
>
> Why would there be a warning triggered? this_cpu_read should take care of
> disabling preemption for the read if needed. In fact the fallback case
> does do exactly that.
And what exactly is the purpose of having a preempt_disable()/enable()
pair in this_cpu_read()?
Nothing, AFAICT. And looking at the usage:
arch/x86/lib/delay.c:124: (this_cpu_read(cpu_info.loops_per_jiffy) * (HZ/4)));
Pointless, as we do not care about which loops_per_jiffy we access.
arch/x86/mm/tlb.c:179: sender = this_cpu_read(tlb_vector_offset);
Pointless, as all callers have preemption disabled.
drivers/acpi/processor_throttling.c:719: if ((this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_INTEL) ||
drivers/acpi/processor_throttling.c:740: if ((this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_INTEL) ||
The conversion was made in commit 9d42a53e with a lousy changelog and
this_cpu_read() with the given semantics is simply crap as this code
wants to run on a given cpu as we access msrs which we don't want to
do in a random fashion. In fact if you had used __this_cpu_read() and
that code would have contained a debug check then the callchain coming
up from thermal_cooling_device_cur_state_store() would have triggered
it AFAICT.
drivers/dma/dmaengine.c:331: return this_cpu_read(channel_table[tx_type]->chan);
Blindly converted w/o noticing the obvious problem with this code
(even befor the conversion). We just blindly return a channel from the
cpu on which this happens to be called. Can all the calling code deal
with possibly being migrated away after that ? If yes, then this wants
to have a comment. If no, then the calling convention wants to be
documented.
drivers/input/gameport/gameport.c:124: return (this_cpu_read(cpu_info.loops_per_jiffy) *
Harmless
include/linux/interrupt.h:462: return this_cpu_read(ksoftirqd);
Pointless. This is called from account_system_vtime() and we already
use __this_cpu_read() in that very function.
kernel/irq_work.c:125: if (this_cpu_read(irq_work_list) == NULL)
Pointless, called from interrupts disabled context
kernel/sched.c:2013: latest_ns = this_cpu_read(cpu_hardirq_time);
Pointless, has interrupts disabled
kernel/sched.c:2028: latest_ns = this_cpu_read(cpu_softirq_time);
Ditto
mm/memcontrol.c:686: val = this_cpu_read(mem->stat->events[MEM_CGROUP_EVENTS_COUNT]);
mm/memcontrol.c:687: next = this_cpu_read(mem->stat->targets[target]);
Completely broken as Steven pointed out already
mm/memcontrol.c:696: val = this_cpu_read(mem->stat->events[MEM_CGROUP_EVENTS_COUNT]);
Broken as well, as it writes back to the same per cpu var, unless
called from atomic contexts.
mm/memcontrol.c:1321: return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
Either broken by returning random data or pointless when called from
atomic contexts.
mm/vmstat.c:331: t = this_cpu_read(pcp->stat_threshold);
mm/vmstat.c:333: o = this_cpu_read(*p);
Race known and dealt with.
So most of the this_cpu_read() sites are either pointless or broken. I
suspect that other this_cpu_* stuff has the same problems.
So what's the point of that interface again? Random access to random
cpu data protected with a preempt_disable/enable() pair to paper over
the real bugs?
The whole idea of having this_cpu_* which can be called from any
context is completely stupid as it is just a guarantee for misuse and
failure.
The best thing is to remove the this_cpu_* hackery alltogether and
just keep the __this_cpu_* versions (along with proper debugging) and
git rid of the silly underscores.
> I think it would make more sence if __this_cpu_read() could be made to
> trigger a warning if used in context where preemption could be off.
It should have had such a warning in the first place and the warning
needs to yell about preemptible (i.e. unprotected) context and not the
other way round.
But instead of just slapping smp_processor_id() checks into those
functions we should add a more sensible debug interface like:
debug_check_percpu_access()
and the per cpu sections which require protection over a series (1
.. N) of this_cpu_* operations want to have
this_cpu_start()
this_cpu_end()
or similar annotations around them.
This allows us to do proper analysis of this_cpu usage and makes the
code understandable.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists