lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 20 Nov 2009 19:06:13 -0500 From: Masami Hiramatsu <mhiramat@...hat.com> To: "H. Peter Anvin" <hpa@...or.com> CC: Jason Baron <jbaron@...hat.com>, linux-kernel@...r.kernel.org, mingo@...e.hu, mathieu.desnoyers@...ymtl.ca, tglx@...utronix.de, rostedt@...dmis.org, andi@...stfloor.org, roland@...hat.com, rth@...hat.com Subject: Re: [RFC PATCH 2/6] jump label v3 - x86: Introduce generic jump patching without stop_machine Hi Peter, H. Peter Anvin wrote: > On 11/18/2009 02:43 PM, Jason Baron wrote: >> Add text_poke_fixup() which takes a fixup address to where a processor >> jumps if it hits the modifying address while code modifying. >> text_poke_fixup() does following steps for this purpose. >> >> 1. Setup int3 handler for fixup. >> 2. Put a breakpoint (int3) on the first byte of modifying region, >> and synchronize code on all CPUs. >> 3. Modify other bytes of modifying region, and synchronize code on all CPUs. >> 4. Modify the first byte of modifying region, and synchronize code >> on all CPUs. >> 5. Clear int3 handler. >> >> Thus, if some other processor execute modifying address when step2 to step4, >> it will be jumped to fixup code. >> >> This still has many limitations for modifying multi-instructions at once. >> However, it is enough for 'a 5 bytes nop replacing with a jump' patching, >> because; >> - Replaced instruction is just one instruction, which is executed atomically. >> - Replacing instruction is a jump, so we can set fixup address where the jump >> goes to. >> > > I just had a thought about this... regardless of if this is safe or not > (which still remains to be determined)... I have a bit more of a > fundamental question about it: > > This code ends up taking *two* global IPIs for each instruction > modification. Each of those requires whole-system synchronization. As Mathieu and I talked, first IPI is for synchronizing code, and second is for waiting for all int3 handling is done. > How > is this better than taking one IPI and having the other CPUs wait until > the modification is complete before returning? Would you mean using stop_machine()? :-) If we don't care about NMI, we can use stop_machine() (for this reason, kprobe-jump-optimization can use stop_machine(), because kprobes can't probe NMI code), but tracepoint has to support NMI. Actually, it might be possible, even it will be complicated. If one-byte modifying(int3 injection/removing) is always synchronized, I assume below timechart can work (and it can support NMI/SMI too). ---- <CPU0> <CPU1> flag = 0 setup int3 handler int3 injection[sync] other-bytes modifying smp_call_function(func) func() wait_until(flag==1) irq_disable() sync_core() for other-bytes modifying flag = 1 first-byte modifying[sync] wait_until(flag==2) flag = 2 wait_until(flag==3) irq_enable() flag = 3 cleanup int3 handler return return ---- I'm not so sure that this flag-based step-by-step code can work faster than 2 IPIs :-( Any comments are welcome! :-) Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat@...hat.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists