lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1354903568.17101.65.camel@gandalf.local.home>
Date:	Fri, 07 Dec 2012 13:06:08 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	"Jon Medhurst (Tixy)" <tixy@...aro.org>
Cc:	Russell King - ARM Linux <linux@....linux.org.uk>,
	linux-arm-kernel@...ts.infradead.org,
	Ingo Molnar <mingo@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Rabin Vincent <rabin@....in>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ARM: ftrace: Ensure code modifications are synchronised
 across all cpus

On Fri, 2012-12-07 at 17:45 +0000, Jon Medhurst (Tixy) wrote:
> On Fri, 2012-12-07 at 12:13 -0500, Steven Rostedt wrote:
> > I'll make my question more general:
> > 
> > If I have a nop, that is a size of a call (branch and link), which is
> > near the beginning of a function and not part of any conditional, and I
> > want to convert it into a call (branch and link), would adding a
> > breakpoint to it, modifying it to the call, and then removing the
> > breakpoint be possible? Of course it would require syncing in between
> > steps, but my question is, if the above is possible on a thumb2 ARM
> > processor?
> 
> I believe so. The details are (repeating your earlier explanation) ...
> 
> 1. Replace first half of nop with 16bit 'breakpoint' instruction.
> 
> 2. Sync.(cache flush to PoU + IPIs to make other cores invalidate the
> icache for changed part of the nop instruction).
> 
> 3. Replace second half of nop with second half of the call instruction.
> 
> 4. Sync.
> 
> 5. Replace the breakpoint with the first half of the call instruction.
> 
> 6. Sync
> 
> And if any core execute the breakpoint instruction, then the handler
> ensures execution continues at the instruction after the nop were trying
> to replace.

Exactly!

> 
> However, wouldn't we need any of this breakpoint malarkey, why not just
> just use a 16-bit branch instruction which branches over the second half
> of the nop? :-)

If you can get away with that, sure. Or better yet. If the arch supports
it, you can do what I did with powerpc. That was just replace the nop
with the 32bit branch, and the 32bit branch with a 32bit nop. No nops.
No multiple steps in between. I just did the swap of all function
tracepoints in one fell swoop, and then did the icache sync.

Now that's if the arch doesn't have issues with swapping code like this.
Can a 32bit branch-and-link be spread across cache lines? On x86 the
call is 5 bytes and can be. Thus, we were forced to do the breakpoint
because we don't know how the instructions are laid out on the cache
lines.

If 32bit can't be swapped but 16bit never crosses cache lines, then your
approach may also work.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ