lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121210152518.GO14363@n2100.arm.linux.org.uk>
Date:	Mon, 10 Dec 2012 15:25:18 +0000
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Will Deacon <will.deacon@....com>,
	"Jon Medhurst (Tixy)" <tixy@...aro.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Rabin Vincent <rabin@....in>, Ingo Molnar <mingo@...hat.com>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH] ARM: ftrace: Ensure code modifications are
	synchronised across all cpus

On Mon, Dec 10, 2012 at 09:46:41AM -0500, Steven Rostedt wrote:
> Again, you and I are having a disconnect. I'm not a HW expert. I'm
> trying to get a total understanding of what you, Will, Jon and others
> are trying to say.

Well, there's people who think that you're intentionally trying to wind
me up (I'm not alone in this opinion; believe me, I checked with someone
else taking part in this thread and they said as much...)

> > ... which, if it's misaligned to a 32-bit boundary, which can happen with
> > Thumb-2 code, will require the replacement to be done atomically; you will
> > need to use stop_machine() to ensure that other CPUs don't try to execute
> > the instruction mid-way through modification... as I have already
> > explained in my previous mails.
> 
> I'm confused to what is wrong to "misaligned to a 32-bit boundery".
> Isn't it best if it is on a 32-bit boundary? Or do you mean that it's
> misaligned across a 32-bit boundary? I guess I just read it wrong.

What I mean is a store of 32-bit size to an address which is not
numerically an integer multiple of four.

To see why this is a problem, take a moment to think about how you'd
update a misaligned 32-bit value on a 32-bit bus with byte enables.
You need to do it as two transactions.

If your bus is 64-bits wide, then the problem potentially becomes one
where there's an issue if it crosses a 64-bit boundary.  Continue for
larger bus widths...

Now add in the effect of caching with its cache line boundaries, and
what the effects are if a write crosses the cache line boundary (which
means it ends up with two separate validity bits etc.)

Lastly, remember that ARM CPUs have a Harvard cache architecture; that
means that the data paths are entirely separate from the instruction
paths - and in some cases that goes all the way to the memory controller,
but that's not relevant.  The relevant point here is that the point in
the pathways where the instruction and data paths unite can be quite
some distance _outside_ of the CPU.

What this all means is that a misaligned 32-bit store can ultimately
appear as two separate 16-bit stores, which may be interleaved by
other bus activity.  Whether that is visible to other CPUs in a SMP
system as two separate 16-bit stores or not isn't well defined.

x86 in this regard is beautiful; it's fully coherent with everything.
It enforces correctness for almost every situation.  It manages this
by using a hell of a lot of logic to do interlocking and ensure
correct ordering.  If you want that from an ARM CPU then you'd probably
need a comparible amount of logic - and power - to be able to do that.

> Either way, I said there's probably no guarantee that the 32-bit calls
> to mcount that gcc has inserted (or the tracepoints) are going to be
> aligned to 32-bit boundaries.

Correct; there is no guarantee of that what so ever when building for
Thumb-2.

> But I'm wondering if that's still a
> problem. Let's look at the ways another CPU could get the 32-bit
> instruction if it is misaligned, and across two different cache lines,
> or even two different pages:
> 
> 
> 1) the CPU gets the full 32bits as it was on the other CPU, or how it
> will be.
> 
> 2) The CPU gets the first 16bits as it was on the other CPU an the
> second 16bits with the update.
> 
> 3) The CPU gets the first 16bits with the update and the second 16bits
> as it use to be.
> 
> 
> The first case isn't interesting, so lets jump to the 2 and 3rd cases.
> 
> On an update of a 32bit nop to a 16bit breakpoint or branch (jump over
> second half).

Err.  Let me remind you what you said in the message which I replied to
earlier today:

   We are replacing a 32bit call with a nop. That nop must also         
                      ^^^^^
   be 32bits, because we could eventually replace the nop(s) with a 32bit
      ^^^^^^          
   call.

Maybe that's sloppy language, but I tend to read what's written and
interpret it as written... so to now say about 16-bit breakpoint or
branch instructions to me sounds like changing the point of discussion.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ