linux-kernel - Re: Question about barriers for ARM on tools/perf/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150508143729.GJ7862@kernel.org>
Date:	Fri, 8 May 2015 11:37:29 -0300
From:	Arnaldo Carvalho de Melo <acme@...nel.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Will Deacon <will.deacon@....com>, Ingo Molnar <mingo@...nel.org>,
	David Ahern <dsahern@...il.com>, Jiri Olsa <jolsa@...hat.com>,
	Namhyung Kim <namhyung@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Question about barriers for ARM on tools/perf/

Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu:
> On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
> > Wouldn't it be better to go the other way, and use compiler builtins for
> > the memory barriers instead of relying on the kernel? It looks like the
> > perf_mmap__{read,write}_head functions are basically just acquire/release
> > operations and could therefore be implemented using something like
> > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
> > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).
 
> He wants to do smp refcounting, which needs atomic_inc() /
> atomic_inc_non_zero() / atomic_dec_return() etc..

Right, Will concentrated on what we use those barriers for right now in
tools/perf.

What I am doing right now is to expose what we use in perf to a wider
audience, i.e. code being developed in tools/, with the current intent
of implementing referece counting for multithreaded tools/perf/ tools,
right now only 'perf top', but there are patches floating to load a
perf.data file using as many CPUs as one would like, IIRC initially one
per available CPU.

I am using as a fallback the gcc intrinsics (), but I've heard I rather
should not use those, albeit they seemed to work well for x86_64 and
sparc64:

-------------------------------------------

/**
 * atomic_inc - increment atomic variable
 * @v: pointer of type atomic_t
 *
 * Atomically increments @v by 1.
 */
static inline void atomic_inc(atomic_t *v)
{
       __sync_add_and_fetch(&v->counter, 1);
}

/**
 * atomic_dec_and_test - decrement and test
 * @v: pointer of type atomic_t
 *
 * Atomically decrements @v by 1 and
 * returns true if the result is 0, or false for all other
 * cases.
 */
static inline int atomic_dec_and_test(atomic_t *v)
{
       return __sync_sub_and_fetch(&v->counter, 1) == 0;
}

-------------------------------------------

One of my hopes for a byproduct was to take advantage of improvements
made to that code in the kernel, etc.

At least using the same API, i.e.  barrier(), mb(), rmb(), wmb(),
atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be
even cooler.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/