linux-kernel - Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111007185214.GD2978@redhat.com>
Date:	Fri, 7 Oct 2011 14:52:15 -0400
From:	Jason Baron <jbaron@...hat.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Jeremy Fitzhardinge <jeremy@...p.org>,
	Richard Henderson <rth@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	"David S. Miller" <davem@...emloft.net>,
	David Daney <david.daney@...ium.com>,
	Michael Ellerman <michael@...erman.id.au>,
	Jan Glauber <jang@...ux.vnet.ibm.com>,
	the arch/x86 maintainers <x86@...nel.org>,
	Xen Devel <xen-devel@...ts.xensource.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Jeremy Fitzhardinge <jeremy.fitzhardinge@...rix.com>,
	peterz@...radead.org
Subject: Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps

On Fri, Oct 07, 2011 at 01:09:32PM -0400, Steven Rostedt wrote:
> Note, this is just hacked together and needs to be cleaned up. Please do
> not comment on formatting or other sloppiness of this patch. I know it's
> sloppy and I left debug statements in. I want the comments to be on the
> idea of the patch.
> 
> I created a new file called scripts/update_jump_label.[ch] based on some
> of the work of recordmcount.c. This is executed at build time on all
> object files just like recordmcount is. But it does not add any new
> sections, it just modifies the code at build time to convert all jump
> labels into nops.
> 
> The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but
> instead to insert a jmp to the label. Depending on how gcc optimizes the
> code, the jmp will be either end up being a 2 byte or 5 byte jump.
> 
> After an object is compiled, update_jump_label is executed on this file
> and it reads the ELF relocation table to find the jump label locations
> and examines what jump was used. It then converts the jump into either a
> 2 byte or 5 byte nop that is appropriate.
> 
> At boot time, the jump labels no longer need to be converted (although
> we may do so in the future to use better nops depending on the machine
> that is used). When jump labels are enabled, the code is examined to see
> if a two byte or 5 byte version was used, and the appropriate update is
> made.
> 
> I just booted this patch and it worked. I was able to enable and disable
> trace points using jump labels. Benchmarks are welcomed :)
> 
> Comments and thoughts?
> 

Generally, I really like it, I guess b/c I suggested it :) I'll try and
run some workloads on it - A real simple one, I used recently was putting
a single jump label in 'getppid()' and then calling it in a loop - I
wonder if the short nop vs long nop would show up there, as a baseline
test. (fwiw, the jump label vs. no jump label for this test was anywhere
b/w 1-5% improvement).

Anyways, some comments below.  

> -- Steve
> 
> Sloppy-signed-off-by: Steven Rostedt <rostedt@...dmis.org>
> 
> diff --git a/Makefile b/Makefile
> index 31f967c..8368f42 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \
>  
>  HOSTCC       = gcc
>  HOSTCXX      = g++
> -HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer
> +HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -g -fomit-frame-pointer
>  HOSTCXXFLAGS = -O2
>  
>  # Decide whether to build built-in, modular, or both.
> @@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE
>  endif
>  endif
>  
> +ifdef CONFIG_JUMP_LABEL
> +	ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL
> +		BUILD_UPDATE_JUMP_LABEL := y
> +		export BUILD_UPDATE_JUMP_LABEL
> +	endif
> +endif
> +
>  # We trigger additional mismatches with less inlining
>  ifdef CONFIG_DEBUG_SECTION_MISMATCH
>  KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 4b0669c..8fa6934 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI
>  	  subsystem.  Also has support for calculating CPU cycle events
>  	  to determine how many clock cycles in a given period.
>  
> +config HAVE_BUILD_TIME_JUMP_LABEL
> +       bool
> +       help
> +	If an arch uses scripts/update_jump_label to patch in jump nops
> +	at build time, then it must enable this option.
> +
>  config HAVE_ARCH_JUMP_LABEL
>  	bool
>  
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6a47bb2..6de726a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -61,6 +61,7 @@ config X86
>  	select HAVE_ARCH_KMEMCHECK
>  	select HAVE_USER_RETURN_NOTIFIER
>  	select HAVE_ARCH_JUMP_LABEL
> +	select HAVE_BUILD_TIME_JUMP_LABEL
>  	select HAVE_TEXT_POKE_SMP
>  	select HAVE_GENERIC_HARDIRQS
>  	select HAVE_SPARSE_IRQ
> diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> index a32b18c..872b3e1 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -14,7 +14,7 @@
>  static __always_inline bool arch_static_branch(struct jump_label_key *key)
>  {
>  	asm goto("1:"
> -		JUMP_LABEL_INITIAL_NOP
> +		"jmp %l[l_yes]\n"
>  		".pushsection __jump_table,  \"aw\" \n\t"
>  		_ASM_ALIGN "\n\t"
>  		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 3fee346..1f7f88f 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -16,34 +16,75 @@
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> +static unsigned char nop_short[] = { P6_NOP2 };
> +
>  union jump_code_union {
>  	char code[JUMP_LABEL_NOP_SIZE];
>  	struct {
>  		char jump;
>  		int offset;
>  	} __attribute__((packed));
> +	struct {
> +		char jump_short;
> +		char offset_short;
> +	} __attribute__((packed));
>  };
>  
>  void arch_jump_label_transform(struct jump_entry *entry,
>  			       enum jump_label_type type)
>  {
>  	union jump_code_union code;
> +	unsigned char op;
> +	unsigned size;
> +	unsigned char nop;
> +
> +	/* Use probe_kernel_read()? */
> +	op = *(unsigned char *)entry->code;
> +	nop = ideal_nops[NOP_ATOMIC5][0];
>  
>  	if (type == JUMP_LABEL_ENABLE) {
> -		code.jump = 0xe9;
> -		code.offset = entry->target -
> -				(entry->code + JUMP_LABEL_NOP_SIZE);
> -	} else
> -		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
> +		if (op == 0xe9 || op == 0xeb)
> +			/* Already enabled. Warn? */
> +			return;
> +

Using the jump_label_inc/dec interface this shouldn't happen, I would
have it be BUG


> +		/* FIXME for all archs */
> +		if (op == nop_short[0]) {
> +			size = 2;
> +			code.jump_short = 0xeb;
> +			code.offset = entry->target -
> +				(entry->code + 2);
> +			/* Check for overflow ? */
> +		} else if (op == nop) {
> +			size = JUMP_LABEL_NOP_SIZE;
> +			code.jump = 0xe9;
> +			code.offset = entry->target - (entry->code + size);
> +		} else
> +			return; /* WARN ? */

same here, at least WARN, more likely BUG()

> +
> +	} else {
> +		if (op == nop_short[0] || nop)
> +			/* Already disabled, warn? */
> +			return;
> +

same here.

> +		if (op == 0xe9) {
> +			size = JUMP_LABEL_NOP_SIZE;
> +			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
> +		} else if (op == 0xeb) {
> +			size = 2;
> +			memcpy(&code, nop_short, size);
> +		} else
> +			return; /* WARN ? */

same here

> +	}
>  	get_online_cpus();
>  	mutex_lock(&text_mutex);
> -	text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> +	text_poke_smp((void *)entry->code, &code, size);
>  	mutex_unlock(&text_mutex);
>  	put_online_cpus();
>  }
>  
>  void arch_jump_label_text_poke_early(jump_label_t addr)
>  {
> +	return;
>  	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
>  			JUMP_LABEL_NOP_SIZE);
>  }

hmmm...we spent a bunch of time selecting the 'ideal' run-time noops I
wouldn't want to drop that work.

> diff --git a/scripts/Makefile b/scripts/Makefile
> index df7678f..738b65c 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
>  hostprogs-$(CONFIG_VT)           += conmakehash
>  hostprogs-$(CONFIG_IKCONFIG)     += bin2c
>  hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
> +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
>  
>  always		:= $(hostprogs-y) $(hostprogs-m)
>  
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index a0fd502..bc0d89b 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -258,6 +258,15 @@ cmd_modversions =								\
>  	fi;
>  endif
>  
> +ifdef BUILD_UPDATE_JUMP_LABEL
> +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
> +			$(srctree)/scripts/update_jump_label.h
> +cmd_update_jump_label =						\
> +	if [ $(@) != "scripts/mod/empty.o" ]; then		\
> +		$(objtree)/scripts/update_jump_label "$(@)";	\
> +	fi;
> +endif
> +
>  ifdef CONFIG_FTRACE_MCOUNT_RECORD
>  ifdef BUILD_C_RECORDMCOUNT
>  ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
> @@ -294,6 +303,7 @@ define rule_cc_o_c
>  	$(cmd_modversions)						  \
>  	$(call echo-cmd,record_mcount)					  \
>  	$(cmd_record_mcount)						  \
> +	$(cmd_update_jump_label)					  \
>  	scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >    \
>  	                                              $(dot-target).tmp;  \
>  	rm -f $(depfile);						  \
> @@ -301,13 +311,14 @@ define rule_cc_o_c
>  endef
>  
>  # Built-in and composite module parts
> -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE
>  	$(call cmd,force_checksrc)
>  	$(call if_changed_rule,cc_o_c)
>  
>  # Single-part modules are special since we need to mark them in $(MODVERDIR)
>  
> -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
> +		  $(update_jump_label_source) FORCE
>  	$(call cmd,force_checksrc)
>  	$(call if_changed_rule,cc_o_c)
>  	@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
> diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
> new file mode 100644
> index 0000000..86e17bc
> --- /dev/null
> +++ b/scripts/update_jump_label.c
> @@ -0,0 +1,349 @@
> +/*
> + * update_jump_label.c: replace jmps with nops at compile time.
> + * Copyright 2010 Steven Rostedt <srostedt@...hat.com>, Red Hat Inc.
> + *  Parsing of the elf file was influenced by recordmcount.c
> + *  originally written by and copyright to John F. Reiser <jreiser@...Wagon.com>.
> + */
> +
> +/*
> + * Note, this code is originally designed for x86, but may be used by
> + * other archs to do the nop updates at compile time instead of at boot time.
> + * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes.
> + * Inserting a 2 byte where possible helps with both CPU performance and
> + * icache strain.
> + */
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <getopt.h>
> +#include <elf.h>
> +#include <fcntl.h>
> +#include <setjmp.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdarg.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +static int fd_map;	/* File descriptor for file being modified. */
> +static struct stat sb;	/* Remember .st_size, etc. */
> +static int mmap_failed; /* Boolean flag. */
> +
> +static void die(const char *err, const char *fmt, ...)
> +{
> +	va_list ap;
> +
> +	if (err)
> +		perror(err);
> +
> +	if (fmt) {
> +		va_start(ap, fmt);
> +		fprintf(stderr, "Fatal error:  ");
> +		vfprintf(stderr, fmt, ap);
> +		fprintf(stderr, "\n");
> +		va_end(ap);
> +	}
> +
> +	exit(1);
> +}
> +
> +static void usage(char **argv)
> +{
> +	char *arg = argv[0];
> +	char *p = arg+strlen(arg);
> +
> +	while (p >= arg && *p != '/')
> +		p--;
> +	p++;
> +
> +	printf("usage: %s file\n"
> +	       "\n",p);
> +	exit(-1);
> +}
> +
> +/* w8rev, w8nat, ...: Handle endianness. */
> +
> +static uint64_t w8rev(uint64_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (7 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (6 * 8))
> +	       | ((0xff & (x >> (2 * 8))) << (5 * 8))
> +	       | ((0xff & (x >> (3 * 8))) << (4 * 8))
> +	       | ((0xff & (x >> (4 * 8))) << (3 * 8))
> +	       | ((0xff & (x >> (5 * 8))) << (2 * 8))
> +	       | ((0xff & (x >> (6 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (7 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w4rev(uint32_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (3 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (2 * 8))
> +	       | ((0xff & (x >> (2 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (3 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w2rev(uint16_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (0 * 8));
> +}
> +
> +static uint64_t w8nat(uint64_t const x)
> +{
> +	return x;
> +}
> +
> +static uint32_t w4nat(uint32_t const x)
> +{
> +	return x;
> +}
> +
> +static uint32_t w2nat(uint16_t const x)
> +{
> +	return x;
> +}
> +
> +static uint64_t (*w8)(uint64_t);
> +static uint32_t (*w)(uint32_t);
> +static uint32_t (*w2)(uint16_t);
> +
> +/* ulseek, uread, ...:  Check return value for errors. */
> +
> +static off_t
> +ulseek(int const fd, off_t const offset, int const whence)
> +{
> +	off_t const w = lseek(fd, offset, whence);
> +	if (w == (off_t)-1)
> +		die("lseek", NULL);
> +
> +	return w;
> +}
> +
> +static size_t
> +uread(int const fd, void *const buf, size_t const count)
> +{
> +	size_t const n = read(fd, buf, count);
> +	if (n != count)
> +		die("read", NULL);
> +
> +	return n;
> +}
> +
> +static size_t
> +uwrite(int const fd, void const *const buf, size_t const count)
> +{
> +	size_t const n = write(fd, buf, count);
> +	if (n != count)
> +		die("write", NULL);
> +
> +	return n;
> +}
> +
> +static void *
> +umalloc(size_t size)
> +{
> +	void *const addr = malloc(size);
> +	if (addr == 0)
> +		die("malloc", "malloc failed: %zu bytes\n", size);
> +
> +	return addr;
> +}
> +
> +/*
> + * Get the whole file as a programming convenience in order to avoid
> + * malloc+lseek+read+free of many pieces.  If successful, then mmap
> + * avoids copying unused pieces; else just read the whole file.
> + * Open for both read and write; new info will be appended to the file.
> + * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
> + * do not propagate to the file until an explicit overwrite at the last.
> + * This preserves most aspects of consistency (all except .st_size)
> + * for simultaneous readers of the file while we are appending to it.
> + * However, multiple writers still are bad.  We choose not to use
> + * locking because it is expensive and the use case of kernel build
> + * makes multiple writers unlikely.
> + */
> +static void *mmap_file(char const *fname)
> +{
> +	void *addr;
> +
> +	fd_map = open(fname, O_RDWR);
> +	if (fd_map < 0 || fstat(fd_map, &sb) < 0)
> +		die(fname, "failed to open file");
> +
> +	if (!S_ISREG(sb.st_mode))
> +		die(NULL, "not a regular file: %s\n", fname);
> +
> +	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
> +		    fd_map, 0);
> +
> +	mmap_failed = 0;
> +	if (addr == MAP_FAILED) {
> +		mmap_failed = 1;
> +		addr = umalloc(sb.st_size);
> +		uread(fd_map, addr, sb.st_size);
> +	}
> +	return addr;
> +}
> +
> +static void munmap_file(void *addr)
> +{
> +	if (!mmap_failed)
> +		munmap(addr, sb.st_size);
> +	else
> +		free(addr);
> +	close(fd_map);
> +}
> +
> +static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
> +static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 };
> +static unsigned char *ideal_nop;
> +
> +static int (*make_nop)(void *map, size_t const offset);
> +
> +static int make_nop_x86(void *map, size_t const offset)
> +{
> +	unsigned char *op;
> +	unsigned char *nop;
> +	int size;
> +
> +	/* Determine which type of jmp this is 2 byte or 5. */
> +	op = map + offset;
> +	switch (*op) {
> +	case 0xeb: /* 2 byte */
> +		size = 2;
> +		nop = ideal_nop2_x86;
> +		break;
> +	case 0xe9: /* 5 byte */
> +		size = 5;
> +		nop = ideal_nop;
> +		break;
> +	default:
> +		die(NULL, "Bad jump label section\n");
> +	}
> +
> +	/* convert to nop */
> +	ulseek(fd_map, offset, SEEK_SET);
> +	uwrite(fd_map, nop, size);
> +	return 0;
> +}
> +
> +/* 32 bit and 64 bit are very similar */
> +#include "update_jump_label.h"
> +#define UPDATE_JUMP_LABEL_64
> +#include "update_jump_label.h"
> +
> +static int do_file(const char *fname)
> +{
> +	Elf32_Ehdr *const ehdr = mmap_file(fname);
> +	unsigned int reltype = 0;
> +
> +	w = w4nat;
> +	w2 = w2nat;
> +	w8 = w8nat;
> +	switch (ehdr->e_ident[EI_DATA]) {
> +		static unsigned int const endian = 1;
> +	default:
> +		die(NULL, "unrecognized ELF data encoding %d: %s\n",
> +			ehdr->e_ident[EI_DATA], fname);
> +		break;
> +	case ELFDATA2LSB:
> +		if (*(unsigned char const *)&endian != 1) {
> +			/* main() is big endian, file.o is little endian. */
> +			w = w4rev;
> +			w2 = w2rev;
> +			w8 = w8rev;
> +		}
> +		break;
> +	case ELFDATA2MSB:
> +		if (*(unsigned char const *)&endian != 0) {
> +			/* main() is little endian, file.o is big endian. */
> +			w = w4rev;
> +			w2 = w2rev;
> +			w8 = w8rev;
> +		}
> +		break;
> +	}  /* end switch */
> +
> +	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
> +	    w2(ehdr->e_type) != ET_REL ||
> +	    ehdr->e_ident[EI_VERSION] != EV_CURRENT)
> +		die(NULL, "unrecognized ET_REL file %s\n", fname);
> +
> +	switch (w2(ehdr->e_machine)) {
> +	default:
> +		die(NULL, "unrecognized e_machine %d %s\n",
> +		    w2(ehdr->e_machine), fname);
> +		break;
> +	case EM_386:
> +		reltype = R_386_32;
> +		make_nop = make_nop_x86;
> +		ideal_nop = ideal_nop5_x86_32;
> +		break;
> +	case EM_ARM:	 reltype = R_ARM_ABS32;
> +			 break;
> +	case EM_IA_64:	 reltype = R_IA64_IMM64; break;
> +	case EM_MIPS:	 /* reltype: e_class    */ break;
> +	case EM_PPC:	 reltype = R_PPC_ADDR32;   break;
> +	case EM_PPC64:	 reltype = R_PPC64_ADDR64; break;
> +	case EM_S390:    /* reltype: e_class    */ break;
> +	case EM_SH:	 reltype = R_SH_DIR32;                 break;
> +	case EM_SPARCV9: reltype = R_SPARC_64;     break;
> +	case EM_X86_64:
> +		make_nop = make_nop_x86;
> +		ideal_nop = ideal_nop5_x86_64;
> +		reltype = R_X86_64_64;
> +		break;
> +	}  /* end switch */
> +
> +	switch (ehdr->e_ident[EI_CLASS]) {
> +	default:
> +		die(NULL, "unrecognized ELF class %d %s\n",
> +		    ehdr->e_ident[EI_CLASS], fname);
> +		break;
> +	case ELFCLASS32:
> +		if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
> +		||  w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr))
> +			die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +		if (w2(ehdr->e_machine) == EM_S390) {
> +			reltype = R_390_32;
> +		}
> +		if (w2(ehdr->e_machine) == EM_MIPS) {
> +			reltype = R_MIPS_32;
> +		}
> +		do_func32(ehdr, fname, reltype);
> +		break;
> +	case ELFCLASS64: {
> +		Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
> +		if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
> +		||  w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr))
> +			die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +		if (w2(ghdr->e_machine) == EM_S390)
> +			reltype = R_390_64;
> +
> +#if 0
> +		if (w2(ghdr->e_machine) == EM_MIPS) {
> +			reltype = R_MIPS_64;
> +			Elf64_r_sym = MIPS64_r_sym;
> +		}
> +#endif
> +		do_func64(ghdr, fname, reltype);
> +		break;
> +	}
> +	}  /* end switch */
> +
> +	munmap_file(ehdr);
> +	return 0;
> +}
> +
> +int main (int argc, char **argv)
> +{
> +	if (argc != 2)
> +		usage(argv);
> +	
> +	return do_file(argv[1]);
> +}
> +
> diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h
> new file mode 100644
> index 0000000..6ff9846
> --- /dev/null
> +++ b/scripts/update_jump_label.h
> @@ -0,0 +1,322 @@
> +/*
> + * recordmcount.h
> + *
> + * This code was taken out of recordmcount.c written by
> + * Copyright 2009 John F. Reiser <jreiser@...Wagon.com>.  All rights reserved.
> + *
> + * The original code had the same algorithms for both 32bit
> + * and 64bit ELF files, but the code was duplicated to support
> + * the difference in structures that were used. This
> + * file creates a macro of everything that is different between
> + * the 64 and 32 bit code, such that by including this header
> + * twice we can create both sets of functions by including this
> + * header once with RECORD_MCOUNT_64 undefined, and again with
> + * it defined.
> + *
> + * This conversion to macros was done by:
> + * Copyright 2010 Steven Rostedt <srostedt@...hat.com>, Red Hat Inc.
> + *
> + * Licensed under the GNU General Public License, version 2 (GPLv2).
> + */
> +
> +#undef EBITS
> +#undef _w
> +#undef _align
> +#undef _size
> +
> +#ifdef UPDATE_JUMP_LABEL_64
> +# define EBITS			64
> +# define _w			w8
> +# define _align			7u
> +# define _size			8
> +#else
> +# define EBITS			32
> +# define _w			w
> +# define _align			3u
> +# define _size			4
> +#endif
> +
> +#define _FBITS(x, e)	x##e
> +#define FBITS(x, e)	_FBITS(x,e)
> +#define FUNC(x)		FBITS(x,EBITS)
> +
> +#undef Elf_Addr
> +#undef Elf_Ehdr
> +#undef Elf_Shdr
> +#undef Elf_Rel
> +#undef Elf_Rela
> +#undef Elf_Sym
> +#undef ELF_R_SYM
> +#undef ELF_R_TYPE
> +
> +#define __ATTACH(x,y,z)	x##y##z
> +#define ATTACH(x,y,z)	__ATTACH(x,y,z)
> +
> +#define Elf_Addr	ATTACH(Elf,EBITS,_Addr)
> +#define Elf_Ehdr	ATTACH(Elf,EBITS,_Ehdr)
> +#define Elf_Shdr	ATTACH(Elf,EBITS,_Shdr)
> +#define Elf_Rel		ATTACH(Elf,EBITS,_Rel)
> +#define Elf_Rela	ATTACH(Elf,EBITS,_Rela)
> +#define Elf_Sym		ATTACH(Elf,EBITS,_Sym)
> +#define uint_t		ATTACH(uint,EBITS,_t)
> +#define ELF_R_SYM	ATTACH(ELF,EBITS,_R_SYM)
> +#define ELF_R_TYPE	ATTACH(ELF,EBITS,_R_TYPE)
> +
> +#undef get_shdr
> +#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr)))
> +
> +#undef get_section_loc
> +#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr))
> +
> +/* Functions and pointers that do_file() may override for specific e_machine. */
> +
> +#if 0
> +static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp)
> +{
> +	return ELF_R_SYM(_w(rp->r_info));
> +}
> +static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM);
> +#endif
> +
> +static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr,
> +				 Elf_Ehdr const *const ehdr,
> +				 Elf_Sym const **sym0,
> +				 char const **str0,
> +				 Elf_Rel const **relp)
> +{
> +	Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +	unsigned const symsec_sh_link = w(relhdr->sh_link);
> +	Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
> +	Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
> +	Elf_Rel const *const rel0 =
> +		(Elf_Rel const *)get_section_loc(ehdr, relhdr);
> +
> +	*sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec);
> +
> +	*str0 = (char const *)get_section_loc(ehdr, strsec);
> +
> +	*relp = rel0;
> +}
> +
> +/*
> + * Read the relocation table again, but this time its called on sections
> + * that are not going to be traced. The mcount calls here will be converted
> + * into nops.
> + */
> +static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr,
> +		       Elf_Ehdr const *const ehdr,
> +		       const char *const txtname)
> +{
> +	Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +	Elf_Sym const *sym0;
> +	char const *str0;
> +	Elf_Rel const *relp;
> +	Elf_Rela const *relap;
> +	Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)];
> +	unsigned rel_entsize = w(relhdr->sh_entsize);
> +	unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
> +	int t;
> +
> +	FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp);
> +
> +	for (t = nrel; t > 0; t -= 3) {
> +		int ret = -1;
> +
> +		relap = (Elf_Rela const *)relp;
> +		printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n",
> +		       (long)relap->r_offset, (long)relap->r_info,
> +		       (long)ELF_R_SYM(relap->r_info),
> +		       (long)ELF_R_TYPE(relap->r_info),
> +		       (long)relap->r_addend);
> +
> +		if (0 && make_nop)
> +			ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset);
> +
> +		/* jump label sections are paired in threes */
> +		relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp);
> +	}
> +}
> +
> +/* Evade ISO C restriction: no declaration after statement in has_rel_mcount. */
> +static char const *
> +FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr,  /* is SHT_REL or SHT_RELA */
> +		 Elf_Shdr const *const shdr0,
> +		 char const *const shstrtab,
> +		 char const *const fname)
> +{
> +	/* .sh_info depends on .sh_type == SHT_REL[,A] */
> +	Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
> +	char const *const txtname = &shstrtab[w(txthdr->sh_name)];
> +
> +	if (strcmp("__jump_table", txtname) == 0) {
> +		fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
> +			fname);
> +//		succeed_file();
> +	}
> +	if (w(txthdr->sh_type) != SHT_PROGBITS ||
> +	    !(w(txthdr->sh_flags) & SHF_EXECINSTR))
> +		return NULL;
> +	return txtname;
> +}
> +
> +static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr,
> +				      Elf_Shdr const *const shdr0,
> +				      char const *const shstrtab,
> +				      char const *const fname)
> +{
> +	if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
> +		return NULL;
> +	return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname);
> +}
> +
> +/* Find relocation section hdr for a given section */
> +static const Elf_Shdr *
> +FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	int nhdr = w2(ehdr->e_shnum);
> +	const Elf_Shdr *hdr;
> +	int i;
> +
> +	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +		if (w(hdr->sh_type) != SHT_REL &&
> +		    w(hdr->sh_type) != SHT_RELA)
> +			continue;
> +
> +		/*
> +		 * The relocation section's info field holds
> +		 * the section index that it represents.
> +		 */
> +		if (shdr == &shdr0[w(hdr->sh_info)])
> +			return hdr;
> +	}
> +	return NULL;
> +}
> +
> +/* Find a section headr based on name and type */
> +static const Elf_Shdr *
> +FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)];
> +	const char *shstrtab = (char *)get_section_loc(ehdr, shstr);
> +	int nhdr = w2(ehdr->e_shnum);
> +	const Elf_Shdr *hdr;
> +	const char *hdrname;
> +	int i;
> +
> +	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +		if (w(hdr->sh_type) != type)
> +			continue;
> +
> +		/* If we are just looking for a section by type (ie. SYMTAB) */
> +		if (!name)
> +			return hdr;
> +
> +		hdrname = &shstrtab[w(hdr->sh_name)];
> +		if (strcmp(hdrname, name) == 0)
> +			return hdr;
> +	}
> +	return NULL;
> +}
> +
> +static void
> +FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr,
> +		     unsigned shtype, const Elf_Rel *rel, void *data)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	const Elf_Shdr *targethdr;
> +	const Elf_Rela *rela;
> +	const Elf_Sym *syment;
> +	uint_t offset = _w(rel->r_offset);
> +	uint_t info = _w(rel->r_info);
> +	uint_t sym = ELF_R_SYM(info);
> +	uint_t type = ELF_R_TYPE(info);
> +	uint_t addend;
> +	uint_t targetloc;
> +
> +	if (shtype == SHT_RELA) {
> +		rela = (const Elf_Rela *)rel;
> +		addend = _w(rela->r_addend);
> +	} else
> +		addend = _w(*(unsigned short *)(data + offset));
> +
> +	syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr);
> +	targethdr = &shdr0[w2(syment[sym].st_shndx)];
> +	targetloc = _w(targethdr->sh_offset);
> +
> +	/* TODO, need a separate function for all archs */
> +	if (type != R_386_32)
> +		die(NULL, "Arch relocation type %d not supported", type);
> +
> +	targetloc += addend;
> +
> +#if 1
> +	printf("offset=%x target=%x shoffset=%x add=%x\n",
> +	       offset, targetloc, _w(targethdr->sh_offset), addend);
> +#endif
> +	*(uint_t *)(data + offset) = targetloc;
> +}
> +
> +/* Overall supervision for Elf32 ET_REL file. */
> +static void
> +FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype)
> +{
> +	const Elf_Shdr *jlshdr;
> +	const Elf_Shdr *jlrhdr;
> +	const Elf_Shdr *symhdr;
> +	const Elf_Rel *rel;
> +	unsigned size;
> +	unsigned cnt;
> +	unsigned i;
> +	uint_t type;
> +	void *jdata;
> +	void *data;
> +
> +	jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS);
> +	if (!jlshdr)
> +		return;
> +
> +	jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr);
> +	if (!jlrhdr)
> +		return;
> +
> +	/*
> +	 * Create and fill in the __jump_table section and use it to
> +	 * find the offsets into the text that we want to update.
> +	 * We create it so that we do not depend on the order of the
> +	 * relocations, and use the table directly, as it is broken
> +	 * up into sections.
> +	 */
> +	size = _w(jlshdr->sh_size);
> +	data = umalloc(size);
> +
> +	jdata = (void *)get_section_loc(ehdr, jlshdr);
> +	memcpy(data, jdata, size);
> +
> +	cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize);
> +
> +	rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr);
> +
> +	/* Is this as Rel or Rela? */
> +	type = w(jlrhdr->sh_type);
> +
> +	symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB);
> +
> +	for (i = 0; i < cnt; i++) {
> +		FUNC(section_update)(ehdr, symhdr, type, rel, data);
> +		rel = (void *)rel + w(jlrhdr->sh_entsize);
> +	}
> +
> +	/*
> +	 * This is specific to x86. The jump_table is stored in three
> +	 * long words. The first is the location of the jmp target we
> +	 * must update.
> +	 */
> +	cnt = size / sizeof(uint_t);
> +
> +	for (i = 0; i < cnt; i += 3)
> +		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
> +

hmmmm, isn't this the line that actually writes in the no-ops? why isn't
it disabled?

> +	free(data);
> +}
> 
> 

Thanks again for doing this...I was still understanding recordmcount.c ;)

-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/