lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1292013157.13513.69.camel@laptop>
Date:	Fri, 10 Dec 2010 21:32:37 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Russell King - ARM Linux <linux@....linux.org.uk>,
	Mikael Pettersson <mikpe@...uu.se>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	John Stultz <johnstul@...ibm.com>
Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

On Fri, 2010-12-10 at 14:23 -0600, Christoph Lameter wrote:
> On Fri, 10 Dec 2010, Peter Zijlstra wrote:
> 
> > Its not about passing per-cpu pointers, its about passing long pointers.
> >
> > When I write:
> >
> > void foo(u64 *bla)
> > {
> > 	*bla++;
> > }
> >
> > DEFINE_PER_CPU(u64, plop);
> >
> > void bar(void)
> > {
> > 	foo(__this_cpu_ptr(plop));
> > }
> >
> > I want gcc to emit the equivalent to:
> >
> > __this_cpu_inc(plop); /* incq %fs:(%0) */
> >
> > Now I guess the C type system will get in the way of this ever working,
> > since a long pointer would have a distinct type from a regular
> > pointer :/
> >
> > The idea is to use 'regular' functions with the per-cpu data in a
> > transparent manner so as not to have to replicate all logic.
> 
> That would mean you would have to pass information in the pointer at
> runtime indicating that this particular pointer is a per cpu pointer.
> 
> Code for the Itanium arch can do that because it has per cpu virtual
> mappings. So you define a virtual area for per cpu data and then map it
> differently for each processor. If we would have a different page table
> for each processor then we could avoid using segment register and do the
> same on x86.

I don't think its a runtime issue, its a compile time issue. At compile
time the compiler can see the argument is a long pointer:
%fs:(addr,idx,size), and could propagate this into the caller.

The above example will compute the effective address by doing something
like:

  lea %fs:(addr,idx,size),%ebx

and will then do something like

  inc (%ebx)

Where it could easily have optimized this into:

  inc %fs:(addr,idx,size)

esp when foo would be inlined. If its an actual call-site you need
function overloading because a long pointer has a different signature
from a regular pointer, and that is something C doesn't do.

> > > Seems that you do not have that use case in mind. So a seqlock restricted
> > > to a single processor? If so then you wont need any of those smp write
> > > barriers mentioned earlier. A simple compiler barrier() is sufficient.
> >
> > The seqcount is sometimes read by different CPUs, but I don't see why we
> > couldn't do what Eric suggested.
> 
> But you would have to define a per cpu seqlock. Each cpu would have
> its own seqlock. Then you could have this_cpu_read_seqcount_begin and
> friends:
> 

> Then you can do
> 
> this_cpu_read_seqcount_begin(&bla)
> 

Which to me seems to be exactly what Eric proposed..

> But then this seemed to be a discussion related to ARM. ARM does not have
> optimized per cpu accesses.

Nah, there's multiple issues all nicely mangled into one thread ;-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ