[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1311061223290.28798@pianoman.cluster.toy>
Date: Wed, 6 Nov 2013 12:31:53 -0500 (EST)
From: Vince Weaver <vince@...ter.net>
To: Peter Zijlstra <peterz@...radead.org>
cc: mingo@...nel.org, hpa@...or.com, anton@...ba.org,
mathieu.desnoyers@...ymtl.ca, linux-kernel@...r.kernel.org,
michael@...erman.id.au, paulmck@...ux.vnet.ibm.com,
benh@...nel.crashing.org, fweisbec@...il.com, VICTORK@...ibm.com,
tglx@...utronix.de, oleg@...hat.com, mikey@...ling.org,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perf/core] tools/perf: Add required memory barriers
On Wed, 6 Nov 2013, Peter Zijlstra wrote:
> On Wed, Nov 06, 2013 at 03:44:56PM +0100, Peter Zijlstra wrote:
> > long head = ((__atomic long)pc->data_head).load(memory_order_acquire);
> >
> > coupled with:
> >
> > ((__atomic long)pc->data_tail).store(tail, memory_order_release);
> >
> > might be the 'right' and proper C11 incantations to avoid having to
> > touch kernel macros; but would obviously require a recent compiler.
> >
> > Barring that, I think we're stuck with:
> >
> > long head = ACCESS_ONCE(pc->data_head);
> > smp_rmb();
> >
> > ...
> >
> > smp_mb();
> > pc->data_tail = tail;
> >
> > And using the right asm goo for the barriers. That said, all these asm
> > barriers should include a compiler barriers (memory clobber) which
> > _should_ avoid the worst compiler trickery -- although I don't think it
> > completely obviates the need for ACCESS_ONCE() -- uncertain there.
>
> http://software.intel.com/en-us/articles/single-producer-single-consumer-queue/
>
> There's one for icc on x86.
>
I think the problem here is this really isn't a good interface.
Most users just want the most recent batch of samples. Something like
char buffer[4096];
int count;
do {
count=perf_read_sample_buffer(buffer,4096);
process_samples(buffer);
} while(count);
where perf_read_sample_buffer() is a syscall that just copies the current
valid samples to userspace.
Yes, this is inefficient (requires an extra copy of the values) but the
kernel then could handle all the SMP/multithread/barrier/locking issues.
How much overhead is really introduced by making a copy?
Requiring the user of a kernel interface to have a deep knowledge of
optimizing compilers, barriers, and CPU memory models is just asking for
trouble.
Especially as this all needs to get documented in the manpage and I'm not
sure that's possible in a sane fashion.
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists