lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090317151436.GA10092@Krystal>
Date:	Tue, 17 Mar 2009 11:14:37 -0400
From:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	ltt-dev@...ts.casi.polymtl.ca, Ingo Molnar <mingo@...e.hu>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Boyer <jwboyer@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [ltt-dev] cli/sti vs local_cmpxchg and local_add_return

* Nick Piggin (nickpiggin@...oo.com.au) wrote:
> On Tuesday 17 March 2009 12:32:20 Mathieu Desnoyers wrote:
> > Hi,
> >
> > I am trying to get access to some non-x86 hardware to run some atomic
> > primitive benchmarks for a paper on LTTng I am preparing. That should be
> > useful to argue about performance benefit of per-cpu atomic operations
> > vs interrupt disabling. I would like to run the following benchmark
> > module on CONFIG_SMP :
> >
> > - PowerPC
> > - MIPS
> > - ia64
> > - alpha
> >
> > usage :
> > make
> > insmod test-cmpxchg-nolock.ko
> > insmod: error inserting 'test-cmpxchg-nolock.ko': -1 Resource temporarily
> > unavailable dmesg (see dmesg output)
> >
> > If some of you would be kind enough to run my test module provided below
> > and provide the results of these tests on a recent kernel (2.6.26~2.6.29
> > should be good) along with their cpuinfo, I would greatly appreciate.
> >
> > Here are the CAS results for various Intel-based architectures :
> >
> > Architecture         | Speedup                      |      CAS     |       
> >  Interrupts         |
> >
> >                      | (cli + sti) / local cmpxchg  | local | sync | Enable
> >                      | (sti) | Disable (cli)
> >
> > ---------------------------------------------------------------------------
> >---------------------- Intel Pentium 4      | 5.24                         |
> >  25   | 81   | 70           | 61          | AMD Athlon(tm)64 X2  | 4.57    
> >                     |  7    | 17   | 17           | 15          | Intel
> > Core2          | 6.33                         |  6    | 30   | 20          
> > | 18          | Intel Xeon E5405     | 5.25                         |  8   
> > | 24   | 20           | 22          |
> >
> > The benefit expected on PowerPC, ia64 and alpha should principally come
> > from removed memory barriers in the local primitives.
> 
> Benefit versus what? I think all of those architectures can do SMP
> atomic compare exchange sequences without barriers, can't they?
> 

Hi Nick,

I want to compare if it is faster to use SMP cas without barriers to
perform synchronization of the tracing hot path wrt interrupts or if it
is faster to disable interrupts. These decisions will depend on the
benchmark I propose, because it is comparing the time it takes to
perform both.

Overall, the benchmarks will allow to choose between those two
simplified hotpath pseudo-codes (offset is global to the buffer,
commit_count is per-subbuffer).


* lockless :

do {
  old_offset = local_read(&offset);
  get_cycles();
  compute needed size.
  new_offset = old_offset + size;
} while (local_cmpxchg(&offset, old_offset, new_offset) != old_offset);

/*
 * note : writing to buffer is done out-of-order wrt buffer slot
 * physical order.
 */
write_to_buffer(offset);

/*
 * Make sure the data is written in the buffer before commit count is
 * incremented.
 */
smp_wmb();

/* note : incrementing the commit count is also done out-of-order */
count = local_add_return(size, &commit_count[subbuf_index]);
if (count is filling a subbuffer)
  allow to wake up readers


* irq off :

(note : offset and commit count would each be written to atomically
(type unsigned long))

local_irq_save(flags);

get_cycles();
compute needed size;
offset += size;

write_to_buffer(offset);

/*
 * Make sure the data is written in the buffer before commit count is
 * incremented.
 */
smp_wmb();

commit_count[subbuf_index] += size;
if (count is filling a subbuffer)
  allow to wake up readers

local_irq_restore(flags);


* read-side

And basically, the data reader uses its own consumed data offset
"consumed" and reads the commit count corresponding to the subbuffer it
is about to read. It has the following pseudo-code :

(note commit_count and offset read each atomically)

consumed_old = atomic_long_read(&consumed);
compute consumed_idx from consumed_old
commit_count = commit_count[consumed_idx];
(or commit_count = local_read(&commit_count[consumed_idx]) for lockless)

/*
 * read commit count before reading the buffer data and write offset.
 */
smp_rmb();

write_offset = offset;
(or write_offset = local_read(&offset))

if (consumed_old and commit_count shows subbuffer not full)
  return -EAGAIN;

Allow reading subbuffer.


Mathieu

> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev@...ts.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ