lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1311061223290.28798@pianoman.cluster.toy>
Date:	Wed, 6 Nov 2013 12:31:53 -0500 (EST)
From:	Vince Weaver <vince@...ter.net>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	mingo@...nel.org, hpa@...or.com, anton@...ba.org,
	mathieu.desnoyers@...ymtl.ca, linux-kernel@...r.kernel.org,
	michael@...erman.id.au, paulmck@...ux.vnet.ibm.com,
	benh@...nel.crashing.org, fweisbec@...il.com, VICTORK@...ibm.com,
	tglx@...utronix.de, oleg@...hat.com, mikey@...ling.org,
	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perf/core] tools/perf: Add required memory barriers

On Wed, 6 Nov 2013, Peter Zijlstra wrote:

> On Wed, Nov 06, 2013 at 03:44:56PM +0100, Peter Zijlstra wrote:
> > long head = ((__atomic long)pc->data_head).load(memory_order_acquire);
> > 
> > coupled with:
> > 
> > ((__atomic long)pc->data_tail).store(tail, memory_order_release);
> > 
> > might be the 'right' and proper C11 incantations to avoid having to
> > touch kernel macros; but would obviously require a recent compiler.
> > 
> > Barring that, I think we're stuck with:
> > 
> > long head = ACCESS_ONCE(pc->data_head);
> > smp_rmb();
> > 
> > ...
> > 
> > smp_mb();
> > pc->data_tail = tail;
> > 
> > And using the right asm goo for the barriers. That said, all these asm
> > barriers should include a compiler barriers (memory clobber) which
> > _should_ avoid the worst compiler trickery -- although I don't think it
> > completely obviates the need for ACCESS_ONCE() -- uncertain there.
> 
> http://software.intel.com/en-us/articles/single-producer-single-consumer-queue/
> 
> There's one for icc on x86.
> 

I think the problem here is this really isn't a good interface.

Most users just want the most recent batch of samples.  Something like

    char buffer[4096];
    int count;

    do {
       count=perf_read_sample_buffer(buffer,4096);
       process_samples(buffer);
    } while(count);

where perf_read_sample_buffer() is a syscall that just copies the current 
valid samples to userspace.

Yes, this is inefficient (requires an extra copy of the values) but the 
kernel then could handle all the SMP/multithread/barrier/locking issues.

How much overhead is really introduced by making a copy?

Requiring the user of a kernel interface to have a deep knowledge of 
optimizing compilers, barriers, and CPU memory models is just asking for 
trouble.

Especially as this all needs to get documented in the manpage and I'm not 
sure that's possible in a sane fashion.

Vince


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ