[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190710181227.GA9925@oracle.com>
Date: Wed, 10 Jul 2019 14:12:27 -0400
From: Kris Van Hees <kris.van.hees@...cle.com>
To: Kris Van Hees <kris.van.hees@...cle.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org,
dtrace-devel@....oracle.com, linux-kernel@...r.kernel.org,
rostedt@...dmis.org, mhiramat@...nel.org, acme@...nel.org,
ast@...nel.org, daniel@...earbox.net,
Peter Zijlstra <peterz@...radead.org>, Chris Mason <clm@...com>
Subject: Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial
implementation of DTrace
This patch's subject should of course be [PATCH V2 1/1] rather than 0/1.
Sorry about that.
On Wed, Jul 10, 2019 at 08:42:24AM -0700, Kris Van Hees wrote:
> This initial implementation of a tiny subset of DTrace functionality
> provides the following options:
>
> dtrace [-lvV] [-b bufsz] -s script
> -b set trace buffer size
> -l list probes (only works with '-s script' for now)
> -s enable or list probes for the specified BPF program
> -V report DTrace API version
>
> The patch comprises quite a bit of code due to DTrace requiring a few
> crucial components, even in its most basic form.
>
> The code is structured around the command line interface implemented in
> dtrace.c. It provides option parsing and drives the three modes of
> operation that are currently implemented:
>
> 1. Report DTrace API version information.
> Report the version information and terminate.
>
> 2. List probes in BPF programs.
> Initialize the list of probes that DTrace recognizes, load BPF
> programs, parse all BPF ELF section names, resolve them into
> known probes, and emit the probe names. Then terminate.
>
> 3. Load BPF programs and collect tracing data.
> Initialize the list of probes that DTrace recognizes, load BPF
> programs and attach them to their corresponding probes, set up
> perf event output buffers, and start processing tracing data.
>
> This implementation makes extensive use of BPF (handled by dt_bpf.c) and
> the perf event output ring buffer (handled by dt_buffer.c). DTrace-style
> probe handling (dt_probe.c) offers an interface to probes that hides the
> implementation details of the individual probe types by provider (dt_fbt.c
> and dt_syscall.c). Probe lookup by name uses a hashtable implementation
> (dt_hash.c). The dt_utils.c code populates a list of online CPU ids, so
> we know what CPUs we can obtain tracing data from.
>
> Building the tool is trivial because its only dependency (libbpf) is in
> the kernel tree under tools/lib/bpf. A simple 'make' in the tools/dtrace
> directory suffices.
>
> The 'dtrace' executable needs to run as root because BPF programs cannot
> be loaded by non-root users.
>
> Signed-off-by: Kris Van Hees <kris.van.hees@...cle.com>
> Reviewed-by: David Mc Lean <david.mclean@...cle.com>
> Reviewed-by: Eugene Loh <eugene.loh@...cle.com>
> ---
> Changes in v2:
> - Use ring_buffer_read_head() and ring_buffer_write_tail() to
> avoid use of volatile.
> - Handle perf events that wrap around the ring buffer boundary.
> - Remove unnecessary PERF_EVENT_IOC_ENABLE.
> - Remove -I$(srctree)/tools/perf from KBUILD_HOSTCFLAGS since it
> is not actually used.
> - Use PT_REGS_PARM1(x), etc instead of my own macros. Adding
> PT_REGS_PARM6(x) in bpf_sample.c because we need to be able to
> support up to 6 arguments passed by registers.
> ---
> MAINTAINERS | 6 +
> tools/dtrace/Makefile | 87 ++++++++++
> tools/dtrace/bpf_sample.c | 146 ++++++++++++++++
> tools/dtrace/dt_bpf.c | 185 ++++++++++++++++++++
> tools/dtrace/dt_buffer.c | 338 +++++++++++++++++++++++++++++++++++++
> tools/dtrace/dt_fbt.c | 201 ++++++++++++++++++++++
> tools/dtrace/dt_hash.c | 211 +++++++++++++++++++++++
> tools/dtrace/dt_probe.c | 230 +++++++++++++++++++++++++
> tools/dtrace/dt_syscall.c | 179 ++++++++++++++++++++
> tools/dtrace/dt_utils.c | 132 +++++++++++++++
> tools/dtrace/dtrace.c | 249 +++++++++++++++++++++++++++
> tools/dtrace/dtrace.h | 13 ++
> tools/dtrace/dtrace_impl.h | 101 +++++++++++
> 13 files changed, 2078 insertions(+)
> create mode 100644 tools/dtrace/Makefile
> create mode 100644 tools/dtrace/bpf_sample.c
> create mode 100644 tools/dtrace/dt_bpf.c
> create mode 100644 tools/dtrace/dt_buffer.c
> create mode 100644 tools/dtrace/dt_fbt.c
> create mode 100644 tools/dtrace/dt_hash.c
> create mode 100644 tools/dtrace/dt_probe.c
> create mode 100644 tools/dtrace/dt_syscall.c
> create mode 100644 tools/dtrace/dt_utils.c
> create mode 100644 tools/dtrace/dtrace.c
> create mode 100644 tools/dtrace/dtrace.h
> create mode 100644 tools/dtrace/dtrace_impl.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cfa9ed89c031..410240732d55 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5485,6 +5485,12 @@ W: https://linuxtv.org
> S: Odd Fixes
> F: drivers/media/pci/dt3155/
>
> +DTRACE
> +M: Kris Van Hees <kris.van.hees@...cle.com>
> +L: dtrace-devel@....oracle.com
> +S: Maintained
> +F: tools/dtrace/
> +
> DVB_USB_AF9015 MEDIA DRIVER
> M: Antti Palosaari <crope@....fi>
> L: linux-media@...r.kernel.org
> diff --git a/tools/dtrace/Makefile b/tools/dtrace/Makefile
> new file mode 100644
> index 000000000000..03ae498d1429
> --- /dev/null
> +++ b/tools/dtrace/Makefile
> @@ -0,0 +1,87 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# This Makefile is based on samples/bpf.
> +#
> +# Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> +
> +DT_VERSION := 2.0.0
> +DT_GIT_VERSION := $(shell git rev-parse HEAD 2>/dev/null || \
> + echo Unknown)
> +
> +DTRACE_PATH ?= $(abspath $(srctree)/$(src))
> +TOOLS_PATH := $(DTRACE_PATH)/..
> +SAMPLES_PATH := $(DTRACE_PATH)/../../samples
> +
> +hostprogs-y := dtrace
> +
> +LIBBPF := $(TOOLS_PATH)/lib/bpf/libbpf.a
> +OBJS := dt_bpf.o dt_buffer.o dt_utils.o dt_probe.o \
> + dt_hash.o \
> + dt_fbt.o dt_syscall.o
> +
> +dtrace-objs := $(OBJS) dtrace.o
> +
> +always := $(hostprogs-y)
> +always += bpf_sample.o
> +
> +KBUILD_HOSTCFLAGS += -DDT_VERSION=\"$(DT_VERSION)\"
> +KBUILD_HOSTCFLAGS += -DDT_GIT_VERSION=\"$(DT_GIT_VERSION)\"
> +KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib
> +KBUILD_HOSTCFLAGS += -I$(srctree)/tools/include/uapi
> +KBUILD_HOSTCFLAGS += -I$(srctree)/tools/include/
> +KBUILD_HOSTCFLAGS += -I$(srctree)/usr/include
> +
> +KBUILD_HOSTLDLIBS := $(LIBBPF) -lelf
> +
> +LLC ?= llc
> +CLANG ?= clang
> +LLVM_OBJCOPY ?= llvm-objcopy
> +
> +ifdef CROSS_COMPILE
> +HOSTCC = $(CROSS_COMPILE)gcc
> +CLANG_ARCH_ARGS = -target $(ARCH)
> +endif
> +
> +all:
> + $(MAKE) -C ../../ $(CURDIR)/ DTRACE_PATH=$(CURDIR)
> +
> +clean:
> + $(MAKE) -C ../../ M=$(CURDIR) clean
> + @rm -f *~
> +
> +$(LIBBPF): FORCE
> + $(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(DTRACE_PATH)/../../ O=
> +
> +FORCE:
> +
> +.PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC)
> +
> +verify_cmds: $(CLANG) $(LLC)
> + @for TOOL in $^ ; do \
> + if ! (which -- "$${TOOL}" > /dev/null 2>&1); then \
> + echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\
> + exit 1; \
> + else true; fi; \
> + done
> +
> +verify_target_bpf: verify_cmds
> + @if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
> + echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
> + echo " NOTICE: LLVM version >= 3.7.1 required" ;\
> + exit 2; \
> + else true; fi
> +
> +$(DTRACE_PATH)/*.c: verify_target_bpf $(LIBBPF)
> +$(src)/*.c: verify_target_bpf $(LIBBPF)
> +
> +$(obj)/%.o: $(src)/%.c
> + @echo " CLANG-bpf " $@
> + $(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
> + -I$(srctree)/tools/testing/selftests/bpf/ \
> + -D__KERNEL__ -D__BPF_TRACING__ -Wno-unused-value -Wno-pointer-sign \
> + -D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
> + -Wno-gnu-variable-sized-type-not-at-end \
> + -Wno-address-of-packed-member -Wno-tautological-compare \
> + -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
> + -I$(srctree)/samples/bpf/ -include asm_goto_workaround.h \
> + -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf $(LLC_FLAGS) -filetype=obj -o $@
> diff --git a/tools/dtrace/bpf_sample.c b/tools/dtrace/bpf_sample.c
> new file mode 100644
> index 000000000000..9862f75f92d3
> --- /dev/null
> +++ b/tools/dtrace/bpf_sample.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This sample DTrace BPF tracing program demonstrates how actions can be
> + * associated with different probe types.
> + *
> + * The kprobe/ksys_write probe is a Function Boundary Tracing (FBT) entry probe
> + * on the ksys_write(fd, buf, count) function in the kernel. Arguments to the
> + * function can be retrieved from the CPU registers (struct pt_regs).
> + *
> + * The tracepoint/syscalls/sys_enter_write probe is a System Call entry probe
> + * for the write(d, buf, count) system call. Arguments to the system call can
> + * be retrieved from the tracepoint data passed to the BPF program as context
> + * struct syscall_data) when the probe fires.
> + *
> + * The BPF program associated with each probe prepares a DTrace BPF context
> + * (struct dt_bpf_context) that stores the probe ID and up to 10 arguments.
> + * Only 3 arguments are used in this sample. Then the prorgams call a shared
> + * BPF function (bpf_action) that implements the actual action to be taken when
> + * a probe fires. It prepares a data record to be stored in the tracing buffer
> + * and submits it to the buffer. The data in the data record is obtained from
> + * the DTrace BPF context.
> + *
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <uapi/linux/bpf.h>
> +#include <linux/ptrace.h>
> +#include <linux/version.h>
> +#include <uapi/linux/unistd.h>
> +#include "bpf_helpers.h"
> +
> +#include "dtrace.h"
> +
> +struct syscall_data {
> + struct pt_regs *regs;
> + long syscall_nr;
> + long arg[6];
> +};
> +
> +struct bpf_map_def SEC("maps") buffers = {
> + .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> + .key_size = sizeof(u32),
> + .value_size = sizeof(u32),
> + .max_entries = NR_CPUS,
> +};
> +
> +#if defined(bpf_target_x86)
> +# define PT_REGS_PARM6(x) ((x)->r9)
> +#elif defined(bpf_target_s390x)
> +# define PT_REGS_PARM6(x) ((x)->gprs[7])
> +#elif defined(bpf_target_arm)
> +# define PT_REGS_PARM6(x) ((x)->uregs[5])
> +#elif defined(bpf_target_arm64)
> +# define PT_REGS_PARM6(x) ((x)->regs[5])
> +#elif defined(bpf_target_mips)
> +# define PT_REGS_PARM6(x) ((x)->regs[9])
> +#elif defined(bpf_target_powerpc)
> +# define PT_REGS_PARM6(x) ((x)->gpr[8])
> +#elif defined(bpf_target_sparc)
> +# define PT_REGS_PARM6(x) ((x)->u_regs[UREG_I5])
> +#else
> +# error Argument retrieval from pt_regs is not supported yet on this arch.
> +#endif
> +
> +/*
> + * We must pass a valid BPF context pointer because the bpf_perf_event_output()
> + * helper requires a BPF context pointer as first argument (and the verifier is
> + * validating that we pass a value that is known to be a context pointer).
> + *
> + * This BPF function implements the following D action:
> + * {
> + * trace(curthread);
> + * trace(arg0);
> + * trace(arg1);
> + * trace(arg2);
> + * }
> + *
> + * Expected output will look like:
> + * CPU ID
> + * 15 70423 0xffff8c0968bf8ec0 0x00000000000001 0x0055e019eb3f60 0x0000000000002c
> + * 15 18876 0xffff8c0968bf8ec0 0x00000000000001 0x0055e019eb3f60 0x0000000000002c
> + * | | +-- curthread +--> arg0 (fd) +--> arg1 (buf) +-- arg2 (count)
> + * | |
> + * | +--> probe ID
> + * |
> + * +--> CPU the probe fired on
> + */
> +static noinline int bpf_action(void *bpf_ctx, struct dt_bpf_context *ctx)
> +{
> + int cpu = bpf_get_smp_processor_id();
> + struct data {
> + u32 probe_id; /* mandatory */
> +
> + u64 task; /* first data item (current task) */
> + u64 arg0; /* 2nd data item (arg0, fd) */
> + u64 arg1; /* 3rd data item (arg1, buf) */
> + u64 arg2; /* 4th data item (arg2, count) */
> + } rec;
> +
> + memset(&rec, 0, sizeof(rec));
> +
> + rec.probe_id = ctx->probe_id;
> + rec.task = bpf_get_current_task();
> + rec.arg0 = ctx->argv[0];
> + rec.arg1 = ctx->argv[1];
> + rec.arg2 = ctx->argv[2];
> +
> + bpf_perf_event_output(bpf_ctx, &buffers, cpu, &rec, sizeof(rec));
> +
> + return 0;
> +}
> +
> +SEC("kprobe/ksys_write")
> +int bpf_kprobe(struct pt_regs *regs)
> +{
> + struct dt_bpf_context ctx;
> +
> + memset(&ctx, 0, sizeof(ctx));
> +
> + ctx.probe_id = 18876;
> + ctx.argv[0] = PT_REGS_PARM1(regs);
> + ctx.argv[1] = PT_REGS_PARM2(regs);
> + ctx.argv[2] = PT_REGS_PARM3(regs);
> + ctx.argv[3] = PT_REGS_PARM4(regs);
> + ctx.argv[4] = PT_REGS_PARM5(regs);
> + ctx.argv[5] = PT_REGS_PARM6(regs);
> +
> + return bpf_action(regs, &ctx);
> +}
> +
> +SEC("tracepoint/syscalls/sys_enter_write")
> +int bpf_tp(struct syscall_data *scd)
> +{
> + struct dt_bpf_context ctx;
> +
> + memset(&ctx, 0, sizeof(ctx));
> +
> + ctx.probe_id = 70423;
> + ctx.argv[0] = scd->arg[0];
> + ctx.argv[1] = scd->arg[1];
> + ctx.argv[2] = scd->arg[2];
> +
> + return bpf_action(scd, &ctx);
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +u32 _version SEC("version") = LINUX_VERSION_CODE;
> diff --git a/tools/dtrace/dt_bpf.c b/tools/dtrace/dt_bpf.c
> new file mode 100644
> index 000000000000..78c90de016c6
> --- /dev/null
> +++ b/tools/dtrace/dt_bpf.c
> @@ -0,0 +1,185 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This file provides the interface for handling BPF. It uses the bpf library
> + * to interact with BPF ELF object files.
> + *
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <errno.h>
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <bpf/libbpf.h>
> +#include <linux/kernel.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +
> +#include "dtrace_impl.h"
> +
> +/*
> + * Validate the output buffer map that is specified in the BPF ELF object. It
> + * must match the following definition to be valid:
> + *
> + * struct bpf_map_def SEC("maps") buffers = {
> + * .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> + * .key_size = sizeof(u32),
> + * .value_size = sizeof(u32),
> + * .max_entries = num,
> + * };
> + * where num is greater than dt_maxcpuid.
> + */
> +static int is_valid_buffers(const struct bpf_map_def *mdef)
> +{
> + return mdef->type == BPF_MAP_TYPE_PERF_EVENT_ARRAY &&
> + mdef->key_size == sizeof(u32) &&
> + mdef->value_size == sizeof(u32) &&
> + mdef->max_entries > dt_maxcpuid;
> +}
> +
> +/*
> + * List the probes specified in the given BPF ELF object file.
> + */
> +int dt_bpf_list_probes(const char *fn)
> +{
> + struct bpf_object *obj;
> + struct bpf_program *prog;
> + int rc, fd;
> +
> + libbpf_set_print(NULL);
> +
> + /*
> + * Listing probes is done before the DTrace command line utility loads
> + * the supplied programs. We load them here without attaching them to
> + * probes so that we can retrieve the ELF section names for each BPF
> + * program. The section name indicates the probe that the program is
> + * associated with.
> + */
> + rc = bpf_prog_load(fn, BPF_PROG_TYPE_UNSPEC, &obj, &fd);
> + if (rc)
> + return rc;
> +
> + /*
> + * Loop through the programs in the BPF ELF object, and try to resolve
> + * the section names into probes. Use the supplied callback function
> + * to emit the probe description.
> + */
> + for (prog = bpf_program__next(NULL, obj); prog != NULL;
> + prog = bpf_program__next(prog, obj)) {
> + struct dt_probe *probe;
> +
> + probe = dt_probe_resolve_event(bpf_program__title(prog, false));
> +
> + printf("%5d %10s %17s %33s %s\n", probe->id,
> + probe->prv_name ? probe->prv_name : "",
> + probe->mod_name ? probe->mod_name : "",
> + probe->fun_name ? probe->fun_name : "",
> + probe->prb_name ? probe->prb_name : "");
> + }
> +
> +
> + /* Done with the BPF ELF object. */
> + bpf_object__close(obj);
> +
> + return 0;
> +}
> +
> +/*
> + * Load the given BPF ELF object file.
> + */
> +int dt_bpf_load_file(const char *fn)
> +{
> + struct bpf_object *obj;
> + struct bpf_map *map;
> + struct bpf_program *prog;
> + int rc, fd;
> +
> + libbpf_set_print(NULL);
> +
> + /* Load the BPF ELF object file. */
> + rc = bpf_prog_load(fn, BPF_PROG_TYPE_UNSPEC, &obj, &fd);
> + if (rc)
> + return rc;
> +
> + /* Validate buffers map. */
> + map = bpf_object__find_map_by_name(obj, "buffers");
> + if (map && is_valid_buffers(bpf_map__def(map)))
> + dt_bufmap_fd = bpf_map__fd(map);
> + else
> + goto fail;
> +
> + /*
> + * Loop through the programs and resolve each into the matching probe.
> + * Attach the program to the probe.
> + */
> + for (prog = bpf_program__next(NULL, obj); prog != NULL;
> + prog = bpf_program__next(prog, obj)) {
> + struct dt_probe *probe;
> +
> + probe = dt_probe_resolve_event(bpf_program__title(prog, false));
> + if (!probe)
> + return -ENOENT;
> + if (probe->prov && probe->prov->attach)
> + probe->prov->attach(bpf_program__title(prog, false),
> + bpf_program__fd(prog));
> + }
> +
> + return 0;
> +
> +fail:
> + bpf_object__close(obj);
> + return -EINVAL;
> +}
> +
> +/*
> + * Store the (key, value) pair in the map referenced by the given fd.
> + */
> +int dt_bpf_map_update(int fd, const void *key, const void *val)
> +{
> + union bpf_attr attr;
> +
> + memset(&attr, 0, sizeof(attr));
> +
> + attr.map_fd = fd;
> + attr.key = (u64)(unsigned long)key;
> + attr.value = (u64)(unsigned long)val;
> + attr.flags = 0;
> +
> + return bpf(BPF_MAP_UPDATE_ELEM, &attr);
> +}
> +
> +/*
> + * Attach a trace event and associate a BPF program with it.
> + */
> +int dt_bpf_attach(int event_id, int bpf_fd)
> +{
> + int event_fd;
> + int rc;
> + struct perf_event_attr attr = {};
> +
> + attr.type = PERF_TYPE_TRACEPOINT;
> + attr.sample_type = PERF_SAMPLE_RAW;
> + attr.sample_period = 1;
> + attr.wakeup_events = 1;
> + attr.config = event_id;
> +
> + /*
> + * Register the event (based on its id), and obtain a fd. It gets
> + * created as an enabled probe, so we don't have to explicitly enable
> + * it.
> + */
> + event_fd = perf_event_open(&attr, -1, 0, -1, 0);
> + if (event_fd < 0) {
> + perror("sys_perf_event_open");
> + return -1;
> + }
> +
> + /* Associate the BPF program with the event. */
> + rc = ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, bpf_fd);
> + if (rc < 0) {
> + perror("PERF_EVENT_IOC_SET_BPF");
> + return -1;
> + }
> +
> + return 0;
> +}
> diff --git a/tools/dtrace/dt_buffer.c b/tools/dtrace/dt_buffer.c
> new file mode 100644
> index 000000000000..19bb7e4cfc92
> --- /dev/null
> +++ b/tools/dtrace/dt_buffer.c
> @@ -0,0 +1,338 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This file provides the tracing buffer handling for DTrace. It makes use of
> + * the perf event output ring buffers that can be written to from BPF programs.
> + *
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <sys/epoll.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <linux/bpf.h>
> +#include <linux/perf_event.h>
> +#include <linux/ring_buffer.h>
> +
> +#include "dtrace_impl.h"
> +
> +/*
> + * Probe data is recorded in per-CPU perf ring buffers.
> + */
> +struct dtrace_buffer {
> + int cpu; /* ID of CPU that uses this buffer */
> + int fd; /* fd of perf output buffer */
> + size_t page_size; /* size of each page in buffer */
> + size_t data_size; /* total buffer size */
> + u8 *base; /* address of buffer */
> + u8 *endp; /* address of end of buffer */
> + u8 *tmp; /* temporary event buffer */
> + u32 tmp_len; /* length of temporary event buffer */
> +};
> +
> +static struct dtrace_buffer *dt_buffers;
> +
> +/*
> + * File descriptor for the BPF map that holds the buffers for the online CPUs.
> + * The map is a bpf_array indexed by CPU id, and it stores a file descriptor as
> + * value (the fd for the perf_event that represents the CPU buffer).
> + */
> +int dt_bufmap_fd = -1;
> +
> +/*
> + * Create a perf_event buffer for the given DTrace buffer. This will create
> + * a perf_event ring_buffer, mmap it, and enable the perf_event that owns the
> + * buffer.
> + */
> +static int perf_buffer_open(struct dtrace_buffer *buf)
> +{
> + int pefd;
> + struct perf_event_attr attr = {};
> +
> + /*
> + * Event configuration for BPF-generated output in perf_event ring
> + * buffers. The event is created in enabled state.
> + */
> + attr.config = PERF_COUNT_SW_BPF_OUTPUT;
> + attr.type = PERF_TYPE_SOFTWARE;
> + attr.sample_type = PERF_SAMPLE_RAW;
> + attr.sample_period = 1;
> + attr.wakeup_events = 1;
> + pefd = perf_event_open(&attr, -1, buf->cpu, -1, PERF_FLAG_FD_CLOEXEC);
> + if (pefd < 0) {
> + fprintf(stderr, "perf_event_open(cpu %d): %s\n", buf->cpu,
> + strerror(errno));
> + goto fail;
> + }
> +
> + /*
> + * We add buf->page_size to the buf->data_size, because perf maintains
> + * a meta-data page at the beginning of the memory region. That page
> + * is used for reader/writer symchronization.
> + */
> + buf->fd = pefd;
> + buf->base = mmap(NULL, buf->page_size + buf->data_size,
> + PROT_READ | PROT_WRITE, MAP_SHARED, buf->fd, 0);
> + buf->endp = buf->base + buf->page_size + buf->data_size - 1;
> + if (!buf->base)
> + goto fail;
> +
> + return 0;
> +
> +fail:
> + if (buf->base) {
> + munmap(buf->base, buf->page_size + buf->data_size);
> + buf->base = NULL;
> + buf->endp = NULL;
> + }
> + if (buf->fd) {
> + close(buf->fd);
> + buf->fd = -1;
> + }
> +
> + return -1;
> +}
> +
> +/*
> + * Close the given DTrace buffer. This function disables the perf_event that
> + * owns the buffer, munmaps the memory space, and closes the perf buffer fd.
> + */
> +static void perf_buffer_close(struct dtrace_buffer *buf)
> +{
> + /*
> + * If the perf buffer failed to open, there is no need to close it.
> + */
> + if (buf->fd < 0)
> + return;
> +
> + if (ioctl(buf->fd, PERF_EVENT_IOC_DISABLE, 0) < 0)
> + fprintf(stderr, "PERF_EVENT_IOC_DISABLE(cpu %d): %s\n",
> + buf->cpu, strerror(errno));
> +
> + munmap(buf->base, buf->page_size + buf->data_size);
> +
> + if (close(buf->fd))
> + fprintf(stderr, "perf buffer close(cpu %d): %s\n",
> + buf->cpu, strerror(errno));
> +
> + buf->base = NULL;
> + buf->fd = -1;
> +}
> +
> +/*
> + * Initialize the probe data buffers (one per online CPU). Each buffer will
> + * contain the given number of pages (i.e. total size of each buffer will be
> + * num_pages * getpagesize()). This function also sets up an event polling
> + * descriptor that monitors all CPU buffers at once.
> + */
> +int dt_buffer_init(int num_pages)
> +{
> + int i;
> + int epoll_fd;
> +
> + if (dt_bufmap_fd < 0)
> + return -EINVAL;
> +
> + /* Allocate the per-CPU buffer structs. */
> + dt_buffers = calloc(dt_numcpus, sizeof(struct dtrace_buffer));
> + if (dt_buffers == NULL)
> + return -ENOMEM;
> +
> + /* Set up the event polling file descriptor. */
> + epoll_fd = epoll_create1(EPOLL_CLOEXEC);
> + if (epoll_fd < 0) {
> + free(dt_buffers);
> + return -errno;
> + }
> +
> + for (i = 0; i < dt_numcpus; i++) {
> + int cpu = dt_cpuids[i];
> + struct epoll_event ev;
> + struct dtrace_buffer *buf = &dt_buffers[i];
> +
> + buf->cpu = cpu;
> + buf->page_size = getpagesize();
> + buf->data_size = num_pages * buf->page_size;
> + buf->tmp = NULL;
> + buf->tmp_len = 0;
> +
> + /* Try to create the perf buffer for this DTrace buffer. */
> + if (perf_buffer_open(buf) == -1)
> + continue;
> +
> + /* Store the perf buffer fd in the buffer map. */
> + dt_bpf_map_update(dt_bufmap_fd, &cpu, &buf->fd);
> +
> + /* Add the buffer to the event polling descriptor. */
> + ev.events = EPOLLIN;
> + ev.data.ptr = buf;
> + if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, buf->fd, &ev) == -1) {
> + fprintf(stderr, "EPOLL_CTL_ADD(cpu %d): %s\n",
> + buf->cpu, strerror(errno));
> + continue;
> + }
> + }
> +
> + return epoll_fd;
> +}
> +
> +/*
> + * Clean up the buffers.
> + */
> +void dt_buffer_exit(int epoll_fd)
> +{
> + int i;
> +
> + for (i = 0; i < dt_numcpus; i++)
> + perf_buffer_close(&dt_buffers[i]);
> +
> + free(dt_buffers);
> + close(epoll_fd);
> +}
> +
> +/*
> + * Process and output the probe data at the supplied address.
> + */
> +static void output_event(int cpu, u64 *buf)
> +{
> + u8 *data = (u8 *)buf;
> + struct perf_event_header *hdr;
> +
> + hdr = (struct perf_event_header *)data;
> + data += sizeof(struct perf_event_header);
> +
> + if (hdr->type == PERF_RECORD_SAMPLE) {
> + u8 *ptr = data;
> + u32 i, size, probe_id;
> +
> + /*
> + * struct {
> + * struct perf_event_header header;
> + * u32 size;
> + * u32 probe_id;
> + * u32 gap;
> + * u64 data[n];
> + * }
> + * and data points to the 'size' member at this point.
> + */
> + if (ptr > (u8 *)buf + hdr->size) {
> + fprintf(stderr, "BAD: corrupted sample header\n");
> + return;
> + }
> +
> + size = *(u32 *)data;
> + data += sizeof(size);
> + ptr += sizeof(size) + size;
> + if (ptr != (u8 *)buf + hdr->size) {
> + fprintf(stderr, "BAD: invalid sample size\n");
> + return;
> + }
> +
> + probe_id = *(u32 *)data;
> + data += sizeof(probe_id);
> + size -= sizeof(probe_id);
> + data += sizeof(u32); /* skip 32-bit gap */
> + size -= sizeof(u32);
> + buf = (u64 *)data;
> +
> + printf("%3d %6d ", cpu, probe_id);
> + for (i = 0, size /= sizeof(u64); i < size; i++)
> + printf("%#016lx ", buf[i]);
> + printf("\n");
> + } else if (hdr->type == PERF_RECORD_LOST) {
> + u64 lost;
> +
> + /*
> + * struct {
> + * struct perf_event_header header;
> + * u64 id;
> + * u64 lost;
> + * }
> + * and data points to the 'id' member at this point.
> + */
> + lost = *(u64 *)(data + sizeof(u64));
> +
> + printf("[%ld probes dropped]\n", lost);
> + } else
> + fprintf(stderr, "UNKNOWN: record type %d\n", hdr->type);
> +}
> +
> +/*
> + * Process the available probe data in the given buffer.
> + */
> +static void process_data(struct dtrace_buffer *buf)
> +{
> + struct perf_event_mmap_page *rb_page = (void *)buf->base;
> + struct perf_event_header *hdr;
> + u8 *base;
> + u64 head, tail;
> +
> + /* Set base to be the start of the buffer data. */
> + base = buf->base + buf->page_size;
> +
> + for (;;) {
> + head = ring_buffer_read_head(rb_page);
> + tail = rb_page->data_tail;
> +
> + if (tail == head)
> + break;
> +
> + do {
> + u8 *event = base + tail % buf->data_size;
> + u32 len;
> +
> + hdr = (struct perf_event_header *)event;
> + len = hdr->size;
> +
> + /*
> + * If the perf event data wraps around the boundary of
> + * the buffer, we make a copy in contiguous memory.
> + */
> + if (event + len > buf->endp) {
> + u8 *dst;
> + u32 num;
> +
> + /* Increase buffer as needed. */
> + if (buf->tmp_len < len) {
> + buf->tmp = realloc(buf->tmp, len);
> + buf->tmp_len = len;
> + }
> +
> + dst = buf->tmp;
> + num = buf->endp - event + 1;
> + memcpy(dst, event, num);
> + memcpy(dst + num, base, len - num);
> +
> + event = dst;
> + }
> +
> + output_event(buf->cpu, (u64 *)event);
> +
> + tail += hdr->size;
> + } while (tail != head);
> +
> + ring_buffer_write_tail(rb_page, tail);
> + }
> +}
> +
> +/*
> + * Wait for data to become available in any of the buffers.
> + */
> +int dt_buffer_poll(int epoll_fd, int timeout)
> +{
> + struct epoll_event events[dt_numcpus];
> + int i, cnt;
> +
> + cnt = epoll_wait(epoll_fd, events, dt_numcpus, timeout);
> + if (cnt < 0)
> + return -errno;
> +
> + for (i = 0; i < cnt; i++)
> + process_data((struct dtrace_buffer *)events[i].data.ptr);
> +
> + return cnt;
> +}
> diff --git a/tools/dtrace/dt_fbt.c b/tools/dtrace/dt_fbt.c
> new file mode 100644
> index 000000000000..fcf95243bf97
> --- /dev/null
> +++ b/tools/dtrace/dt_fbt.c
> @@ -0,0 +1,201 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * The Function Boundary Tracing (FBT) provider for DTrace.
> + *
> + * FBT probes are exposed by the kernel as kprobes. They are listed in the
> + * TRACEFS/available_filter_functions file. Some kprobes are associated with
> + * a specific kernel module, while most are in the core kernel.
> + *
> + * Mapping from event name to DTrace probe name:
> + *
> + * <name> fbt:vmlinux:<name>:entry
> + * fbt:vmlinux:<name>:return
> + * or
> + * <name> [<modname>] fbt:<modname>:<name>:entry
> + * fbt:<modname>:<name>:return
> + *
> + * Mapping from BPF section name to DTrace probe name:
> + *
> + * kprobe/<name> fbt:vmlinux:<name>:entry
> + * kretprobe/<name> fbt:vmlinux:<name>:return
> + *
> + * (Note that the BPF section does not carry information about the module that
> + * the function is found in. This means that BPF section name cannot be used
> + * to distinguish between functions with the same name occurring in different
> + * modules.)
> + *
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <linux/bpf.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +
> +#include "dtrace_impl.h"
> +
> +#define KPROBE_EVENTS TRACEFS "kprobe_events"
> +#define PROBE_LIST TRACEFS "available_filter_functions"
> +
> +static const char provname[] = "fbt";
> +static const char modname[] = "vmlinux";
> +
> +/*
> + * Scan the PROBE_LIST file and add entry and return probes for every function
> + * that is listed.
> + */
> +static int fbt_populate(void)
> +{
> + FILE *f;
> + char buf[256];
> + char *p;
> +
> + f = fopen(PROBE_LIST, "r");
> + if (f == NULL)
> + return -1;
> +
> + while (fgets(buf, sizeof(buf), f)) {
> + /*
> + * Here buf is either "funcname\n" or "funcname [modname]\n".
> + */
> + p = strchr(buf, '\n');
> + if (p) {
> + *p = '\0';
> + if (p > buf && *(--p) == ']')
> + *p = '\0';
> + } else {
> + /* If we didn't see a newline, the line was too long.
> + * Report it, and continue until the end of the line.
> + */
> + fprintf(stderr, "%s: Line too long: %s\n",
> + PROBE_LIST, buf);
> + do
> + fgets(buf, sizeof(buf), f);
> + while (strchr(buf, '\n') == NULL);
> + continue;
> + }
> +
> + /*
> + * Now buf is either "funcname" or "funcname [modname". If
> + * there is no module name provided, we will use the default.
> + */
> + p = strchr(buf, ' ');
> + if (p) {
> + *p++ = '\0';
> + if (*p == '[')
> + p++;
> + }
> +
> + dt_probe_new(&dt_fbt, provname, p ? p : modname, buf, "entry");
> + dt_probe_new(&dt_fbt, provname, p ? p : modname, buf, "return");
> + }
> +
> + fclose(f);
> +
> + return 0;
> +}
> +
> +#define ENTRY_PREFIX "kprobe/"
> +#define EXIT_PREFIX "kretprobe/"
> +
> +/*
> + * Perform a probe lookup based on an event name (BPF ELF section name).
> + */
> +static struct dt_probe *fbt_resolve_event(const char *name)
> +{
> + const char *prbname;
> + struct dt_probe tmpl;
> + struct dt_probe *probe;
> +
> + if (!name)
> + return NULL;
> +
> + if (strncmp(name, ENTRY_PREFIX, sizeof(ENTRY_PREFIX) - 1) == 0) {
> + name += sizeof(ENTRY_PREFIX) - 1;
> + prbname = "entry";
> + } else if (strncmp(name, EXIT_PREFIX, sizeof(EXIT_PREFIX) - 1) == 0) {
> + name += sizeof(EXIT_PREFIX) - 1;
> + prbname = "return";
> + } else
> + return NULL;
> +
> + memset(&tmpl, 0, sizeof(tmpl));
> + tmpl.prv_name = provname;
> + tmpl.mod_name = modname;
> + tmpl.fun_name = name;
> + tmpl.prb_name = prbname;
> +
> + probe = dt_probe_by_name(&tmpl);
> +
> + return probe;
> +}
> +
> +/*
> + * Attach the given BPF program (identified by its file descriptor) to the
> + * kprobe identified by the given section name.
> + */
> +static int fbt_attach(const char *name, int bpf_fd)
> +{
> + char efn[256];
> + char buf[256];
> + int event_id, fd, rc;
> +
> + name += 7; /* skip "kprobe/" */
> + snprintf(buf, sizeof(buf), "p:%s %s\n", name, name);
> +
> + /*
> + * Register the kprobe with the tracing subsystem. This will create
> + * a tracepoint event.
> + */
> + fd = open(KPROBE_EVENTS, O_WRONLY | O_APPEND);
> + if (fd < 0) {
> + perror(KPROBE_EVENTS);
> + return -1;
> + }
> + rc = write(fd, buf, strlen(buf));
> + if (rc < 0) {
> + perror(KPROBE_EVENTS);
> + close(fd);
> + return -1;
> + }
> + close(fd);
> +
> + /*
> + * Read the tracepoint event id for the kprobe we just registered.
> + */
> + strcpy(efn, EVENTSFS);
> + strcat(efn, "kprobes/");
> + strcat(efn, name);
> + strcat(efn, "/id");
> +
> + fd = open(efn, O_RDONLY);
> + if (fd < 0) {
> + perror(efn);
> + return -1;
> + }
> + rc = read(fd, buf, sizeof(buf));
> + if (rc < 0 || rc >= sizeof(buf)) {
> + perror(efn);
> + close(fd);
> + return -1;
> + }
> + close(fd);
> + buf[rc] = '\0';
> + event_id = atoi(buf);
> +
> + /*
> + * Attaching a BPF program (by file descriptor) to an event (by ID) is
> + * a generic operation provided by the BPF interface code.
> + */
> + return dt_bpf_attach(event_id, bpf_fd);
> +}
> +
> +struct dt_provider dt_fbt = {
> + .name = "fbt",
> + .populate = &fbt_populate,
> + .resolve_event = &fbt_resolve_event,
> + .attach = &fbt_attach,
> +};
> diff --git a/tools/dtrace/dt_hash.c b/tools/dtrace/dt_hash.c
> new file mode 100644
> index 000000000000..b1f563bc0773
> --- /dev/null
> +++ b/tools/dtrace/dt_hash.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This file provides a generic hashtable implementation for probes.
> + *
> + * The hashtable is created with 4 user-provided functions:
> + * hval(probe) - calculate a hash value for the given probe
> + * cmp(probe1, probe2) - compare two probes
> + * add(head, probe) - add a probe to a list of probes
> + * del(head, probe) - delete a probe from a list of probes
> + *
> + * Probes are hashed into a hashtable slot based on the return value of
> + * hval(probe). Each hashtable slot holds a list of buckets, with each
> + * bucket storing probes that are equal under the cmp(probe1, probe2)
> + * function. Probes are added to the list of probes in a bucket using the
> + * add(head, probe) function, and they are deleted using a call to the
> + * del(head, probe) function.
> + *
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <errno.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +
> +#include "dtrace_impl.h"
> +
> +/*
> + * Hashtable implementation for probes.
> + */
> +struct dt_hbucket {
> + u32 hval;
> + struct dt_hbucket *next;
> + struct dt_probe *head;
> + int nprobes;
> +};
> +
> +struct dt_htab {
> + struct dt_hbucket **tab;
> + int size;
> + int mask;
> + int nbuckets;
> + dt_hval_fn hval; /* calculate hash value */
> + dt_cmp_fn cmp; /* compare 2 probes */
> + dt_add_fn add; /* add probe to list */
> + dt_del_fn del; /* delete probe from list */
> +};
> +
> +/*
> + * Create a new (empty) hashtable.
> + */
> +struct dt_htab *dt_htab_new(dt_hval_fn hval, dt_cmp_fn cmp, dt_add_fn add,
> + dt_del_fn del)
> +{
> + struct dt_htab *htab = malloc(sizeof(struct dt_htab));
> +
> + if (!htab)
> + return NULL;
> +
> + htab->size = 1;
> + htab->mask = htab->size - 1;
> + htab->nbuckets = 0;
> + htab->hval = hval;
> + htab->cmp = cmp;
> + htab->add = add;
> + htab->del = del;
> +
> + htab->tab = calloc(htab->size, sizeof(struct dt_hbucket *));
> + if (!htab->tab) {
> + free(htab);
> + return NULL;
> + }
> +
> + return htab;
> +}
> +
> +/*
> + * Resize the hashtable by doubling the number of slots.
> + */
> +static int resize(struct dt_htab *htab)
> +{
> + int i;
> + int osize = htab->size;
> + int nsize = osize << 1;
> + int nmask = nsize - 1;
> + struct dt_hbucket **ntab;
> +
> + ntab = calloc(nsize, sizeof(struct dt_hbucket *));
> + if (!ntab)
> + return -ENOMEM;
> +
> + for (i = 0; i < osize; i++) {
> + struct dt_hbucket *bucket, *next;
> +
> + for (bucket = htab->tab[i]; bucket; bucket = next) {
> + int idx = bucket->hval & nmask;
> +
> + next = bucket->next;
> + bucket->next = ntab[idx];
> + ntab[idx] = bucket;
> + }
> + }
> +
> + free(htab->tab);
> + htab->tab = ntab;
> + htab->size = nsize;
> + htab->mask = nmask;
> +
> + return 0;
> +}
> +
> +/*
> + * Add a probe to the hashtable. Resize if necessary, and allocate a new
> + * bucket if necessary.
> + */
> +int dt_htab_add(struct dt_htab *htab, struct dt_probe *probe)
> +{
> + u32 hval = htab->hval(probe);
> + int idx;
> + struct dt_hbucket *bucket;
> +
> +retry:
> + idx = hval & htab->mask;
> + for (bucket = htab->tab[idx]; bucket; bucket = bucket->next) {
> + if (htab->cmp(bucket->head, probe) == 0)
> + goto add;
> + }
> +
> + if ((htab->nbuckets >> 1) > htab->size) {
> + int err;
> +
> + err = resize(htab);
> + if (err)
> + return err;
> +
> + goto retry;
> + }
> +
> + bucket = malloc(sizeof(struct dt_hbucket));
> + if (!bucket)
> + return -ENOMEM;
> +
> + bucket->hval = hval;
> + bucket->next = htab->tab[idx];
> + bucket->head = NULL;
> + bucket->nprobes = 0;
> + htab->tab[idx] = bucket;
> + htab->nbuckets++;
> +
> +add:
> + bucket->head = htab->add(bucket->head, probe);
> + bucket->nprobes++;
> +
> + return 0;
> +}
> +
> +/*
> + * Find a probe in the hashtable.
> + */
> +struct dt_probe *dt_htab_lookup(const struct dt_htab *htab,
> + const struct dt_probe *probe)
> +{
> + u32 hval = htab->hval(probe);
> + int idx = hval & htab->mask;
> + struct dt_hbucket *bucket;
> +
> + for (bucket = htab->tab[idx]; bucket; bucket = bucket->next) {
> + if (htab->cmp(bucket->head, probe) == 0)
> + return bucket->head;
> + }
> +
> + return NULL;
> +}
> +
> +/*
> + * Remove a probe from the hashtable. If we are deleting the last probe in a
> + * bucket, get rid of the bucket.
> + */
> +int dt_htab_del(struct dt_htab *htab, struct dt_probe *probe)
> +{
> + u32 hval = htab->hval(probe);
> + int idx = hval & htab->mask;
> + struct dt_hbucket *bucket;
> + struct dt_probe *head;
> +
> + for (bucket = htab->tab[idx]; bucket; bucket = bucket->next) {
> + if (htab->cmp(bucket->head, probe) == 0)
> + break;
> + }
> +
> + if (bucket == NULL)
> + return -ENOENT;
> +
> + head = htab->del(bucket->head, probe);
> + if (!head) {
> + struct dt_hbucket *b = htab->tab[idx];
> +
> + if (bucket == b)
> + htab->tab[idx] = bucket->next;
> + else {
> + while (b->next != bucket)
> + b = b->next;
> +
> + b->next = bucket->next;
> + }
> +
> + htab->nbuckets--;
> + free(bucket);
> + } else
> + bucket->head = head;
> +
> + return 0;
> +}
> diff --git a/tools/dtrace/dt_probe.c b/tools/dtrace/dt_probe.c
> new file mode 100644
> index 000000000000..0b6228eaff29
> --- /dev/null
> +++ b/tools/dtrace/dt_probe.c
> @@ -0,0 +1,230 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This file implements the interface to probes grouped by provider.
> + *
> + * Probes are named by a set of 4 identifiers:
> + * - provider name
> + * - module name
> + * - function name
> + * - probe name
> + *
> + * The Fully Qualified Name (FQN) is "provider:module:function:name".
> + *
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <linux/bpf.h>
> +#include <linux/kernel.h>
> +
> +#include "dtrace_impl.h"
> +
> +static struct dt_provider *dt_providers[] = {
> + &dt_fbt,
> + &dt_syscall,
> + };
> +
> +static struct dt_htab *ht_byfqn;
> +
> +static u32 next_probe_id;
> +
> +/*
> + * Calculate a hash value based on a given string and an initial value. The
> + * initial value is used to calculate compound hash values, e.g.
> + *
> + * u32 hval;
> + *
> + * hval = str2hval(str1, 0);
> + * hval = str2hval(str2, hval);
> + */
> +static u32 str2hval(const char *p, u32 hval)
> +{
> + u32 g;
> +
> + if (!p)
> + return hval;
> +
> + while (*p) {
> + hval = (hval << 4) + *p++;
> + g = hval & 0xf0000000;
> + if (g != 0)
> + hval ^= g >> 24;
> +
> + hval &= ~g;
> + }
> +
> + return hval;
> +}
> +
> +/*
> + * String compare function that can handle either or both strings being NULL.
> + */
> +static int safe_strcmp(const char *p, const char *q)
> +{
> + return (!p) ? (!q) ? 0
> + : -1
> + : (!q) ? 1
> + : strcmp(p, q);
> +}
> +
> +/*
> + * Calculate the hash value of a probe as the cummulative hash value of the
> + * FQN.
> + */
> +static u32 fqn_hval(const struct dt_probe *probe)
> +{
> + u32 hval = 0;
> +
> + hval = str2hval(probe->prv_name, hval);
> + hval = str2hval(":", hval);
> + hval = str2hval(probe->mod_name, hval);
> + hval = str2hval(":", hval);
> + hval = str2hval(probe->fun_name, hval);
> + hval = str2hval(":", hval);
> + hval = str2hval(probe->prb_name, hval);
> +
> + return hval;
> +}
> +
> +/*
> + * Compare two probes based on the FQN.
> + */
> +static int fqn_cmp(const struct dt_probe *p, const struct dt_probe *q)
> +{
> + int rc;
> +
> + rc = safe_strcmp(p->prv_name, q->prv_name);
> + if (rc)
> + return rc;
> + rc = safe_strcmp(p->mod_name, q->mod_name);
> + if (rc)
> + return rc;
> + rc = safe_strcmp(p->fun_name, q->fun_name);
> + if (rc)
> + return rc;
> + rc = safe_strcmp(p->prb_name, q->prb_name);
> + if (rc)
> + return rc;
> +
> + return 0;
> +}
> +
> +/*
> + * Add the given probe 'new' to the double-linked probe list 'head'. Probe
> + * 'new' becomes the new list head.
> + */
> +static struct dt_probe *fqn_add(struct dt_probe *head, struct dt_probe *new)
> +{
> + if (!head)
> + return new;
> +
> + new->he_fqn.next = head;
> + head->he_fqn.prev = new;
> +
> + return new;
> +}
> +
> +/*
> + * Remove the given probe 'probe' from the double-linked probe list 'head'.
> + * If we are deleting the current head, the next probe in the list is returned
> + * as the new head. If that value is NULL, the list is now empty.
> + */
> +static struct dt_probe *fqn_del(struct dt_probe *head, struct dt_probe *probe)
> +{
> + if (head == probe) {
> + if (!probe->he_fqn.next)
> + return NULL;
> +
> + head = probe->he_fqn.next;
> + head->he_fqn.prev = NULL;
> + probe->he_fqn.next = NULL;
> +
> + return head;
> + }
> +
> + if (!probe->he_fqn.next) {
> + probe->he_fqn.prev->he_fqn.next = NULL;
> + probe->he_fqn.prev = NULL;
> +
> + return head;
> + }
> +
> + probe->he_fqn.prev->he_fqn.next = probe->he_fqn.next;
> + probe->he_fqn.next->he_fqn.prev = probe->he_fqn.prev;
> + probe->he_fqn.prev = probe->he_fqn.next = NULL;
> +
> + return head;
> +}
> +
> +/*
> + * Initialize the probe handling by populating the FQN hashtable with probes
> + * from all providers.
> + */
> +int dt_probe_init(void)
> +{
> + int i;
> +
> + ht_byfqn = dt_htab_new(fqn_hval, fqn_cmp, fqn_add, fqn_del);
> +
> + for (i = 0; i < ARRAY_SIZE(dt_providers); i++) {
> + if (dt_providers[i]->populate() < 0)
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Allocate a new probe and add it to the FQN hashtable.
> + */
> +int dt_probe_new(const struct dt_provider *prov, const char *pname,
> + const char *mname, const char *fname, const char *name)
> +{
> + struct dt_probe *probe;
> +
> + probe = malloc(sizeof(struct dt_probe));
> + if (!probe)
> + return -ENOMEM;
> +
> + memset(probe, 0, sizeof(struct dt_probe));
> + probe->id = next_probe_id++;
> + probe->prov = prov;
> + probe->prv_name = pname ? strdup(pname) : NULL;
> + probe->mod_name = mname ? strdup(mname) : NULL;
> + probe->fun_name = fname ? strdup(fname) : NULL;
> + probe->prb_name = name ? strdup(name) : NULL;
> +
> + dt_htab_add(ht_byfqn, probe);
> +
> + return 0;
> +}
> +
> +/*
> + * Perform a probe lookup based on FQN.
> + */
> +struct dt_probe *dt_probe_by_name(const struct dt_probe *tmpl)
> +{
> + return dt_htab_lookup(ht_byfqn, tmpl);
> +}
> +
> +/*
> + * Resolve an event name (BPF ELF section name) into a probe. We query each
> + * provider, and as soon as we get a hit, we return the result.
> + */
> +struct dt_probe *dt_probe_resolve_event(const char *name)
> +{
> + int i;
> + struct dt_probe *probe;
> +
> + for (i = 0; i < ARRAY_SIZE(dt_providers); i++) {
> + if (!dt_providers[i]->resolve_event)
> + continue;
> + probe = dt_providers[i]->resolve_event(name);
> + if (probe)
> + return probe;
> + }
> +
> + return NULL;
> +}
> diff --git a/tools/dtrace/dt_syscall.c b/tools/dtrace/dt_syscall.c
> new file mode 100644
> index 000000000000..6695a4a1c701
> --- /dev/null
> +++ b/tools/dtrace/dt_syscall.c
> @@ -0,0 +1,179 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * The syscall provider for DTrace.
> + *
> + * System call probes are exposed by the kernel as tracepoint events in the
> + * "syscalls" group. Entry probe names start with "sys_enter_" and exit probes
> + * start with "sys_exit_".
> + *
> + * Mapping from event name to DTrace probe name:
> + *
> + * syscalls:sys_enter_<name> syscall:vmlinux:<name>:entry
> + * syscalls:sys_exit_<name> syscall:vmlinux:<name>:return
> + *
> + * Mapping from BPF section name to DTrace probe name:
> + *
> + * tracepoint/syscalls/sys_enter_<name> syscall:vmlinux:<name>:entry
> + * tracepoint/syscalls/sys_exit_<name> syscall:vmlinux:<name>:return
> + *
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <ctype.h>
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <linux/bpf.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +
> +#include "dtrace_impl.h"
> +
> +static const char provname[] = "syscall";
> +static const char modname[] = "vmlinux";
> +
> +#define PROBE_LIST TRACEFS "available_events"
> +
> +#define PROV_PREFIX "syscalls:"
> +#define ENTRY_PREFIX "sys_enter_"
> +#define EXIT_PREFIX "sys_exit_"
> +
> +/*
> + * Scan the PROBE_LIST file and add probes for any syscalls events.
> + */
> +static int syscall_populate(void)
> +{
> + FILE *f;
> + char buf[256];
> +
> + f = fopen(PROBE_LIST, "r");
> + if (f == NULL)
> + return -1;
> +
> + while (fgets(buf, sizeof(buf), f)) {
> + char *p;
> +
> + /* * Here buf is "group:event". */
> + p = strchr(buf, '\n');
> + if (p)
> + *p = '\0';
> + else {
> + /*
> + * If we didn't see a newline, the line was too long.
> + * Report it, and continue until the end of the line.
> + */
> + fprintf(stderr, "%s: Line too long: %s\n",
> + PROBE_LIST, buf);
> + do
> + fgets(buf, sizeof(buf), f);
> + while (strchr(buf, '\n') == NULL);
> + continue;
> + }
> +
> + /* We need "group:" to match "syscalls:". */
> + p = buf;
> + if (memcmp(p, PROV_PREFIX, sizeof(PROV_PREFIX) - 1) != 0)
> + continue;
> +
> + p += sizeof(PROV_PREFIX) - 1;
> + /*
> + * Now p will be just "event", and we are only interested in
> + * events that match "sys_enter_*" or "sys_exit_*".
> + */
> + if (!memcmp(p, ENTRY_PREFIX, sizeof(ENTRY_PREFIX) - 1)) {
> + p += sizeof(ENTRY_PREFIX) - 1;
> + dt_probe_new(&dt_syscall, provname, modname, p,
> + "entry");
> + } else if (!memcmp(p, EXIT_PREFIX, sizeof(EXIT_PREFIX) - 1)) {
> + p += sizeof(EXIT_PREFIX) - 1;
> + dt_probe_new(&dt_syscall, provname, modname, p,
> + "return");
> + }
> + }
> +
> + fclose(f);
> +
> + return 0;
> +}
> +
> +#define EVENT_PREFIX "tracepoint/syscalls/"
> +
> +/*
> + * Perform a probe lookup based on an event name (BPF ELF section name).
> + */
> +static struct dt_probe *systrace_resolve_event(const char *name)
> +{
> + const char *prbname;
> + struct dt_probe tmpl;
> + struct dt_probe *probe;
> +
> + if (!name)
> + return NULL;
> +
> + /* Exclude anything that is not a syscalls tracepoint */
> + if (strncmp(name, EVENT_PREFIX, sizeof(EVENT_PREFIX) - 1) != 0)
> + return NULL;
> + name += sizeof(EVENT_PREFIX) - 1;
> +
> + if (strncmp(name, ENTRY_PREFIX, sizeof(ENTRY_PREFIX) - 1) == 0) {
> + name += sizeof(ENTRY_PREFIX) - 1;
> + prbname = "entry";
> + } else if (strncmp(name, EXIT_PREFIX, sizeof(EXIT_PREFIX) - 1) == 0) {
> + name += sizeof(EXIT_PREFIX) - 1;
> + prbname = "return";
> + } else
> + return NULL;
> +
> + memset(&tmpl, 0, sizeof(tmpl));
> + tmpl.prv_name = provname;
> + tmpl.mod_name = modname;
> + tmpl.fun_name = name;
> + tmpl.prb_name = prbname;
> +
> + probe = dt_probe_by_name(&tmpl);
> +
> + return probe;
> +}
> +
> +#define SYSCALLSFS EVENTSFS "syscalls/"
> +
> +/*
> + * Attach the given BPF program (identified by its file descriptor) to the
> + * event identified by the given section name.
> + */
> +static int syscall_attach(const char *name, int bpf_fd)
> +{
> + char efn[256];
> + char buf[256];
> + int event_id, fd, rc;
> +
> + name += sizeof(EVENT_PREFIX) - 1;
> + strcpy(efn, SYSCALLSFS);
> + strcat(efn, name);
> + strcat(efn, "/id");
> +
> + fd = open(efn, O_RDONLY);
> + if (fd < 0) {
> + perror(efn);
> + return -1;
> + }
> + rc = read(fd, buf, sizeof(buf));
> + if (rc < 0 || rc >= sizeof(buf)) {
> + perror(efn);
> + close(fd);
> + return -1;
> + }
> + close(fd);
> + buf[rc] = '\0';
> + event_id = atoi(buf);
> +
> + return dt_bpf_attach(event_id, bpf_fd);
> +}
> +
> +struct dt_provider dt_syscall = {
> + .name = "syscall",
> + .populate = &syscall_populate,
> + .resolve_event = &systrace_resolve_event,
> + .attach = &syscall_attach,
> +};
> diff --git a/tools/dtrace/dt_utils.c b/tools/dtrace/dt_utils.c
> new file mode 100644
> index 000000000000..55d51bae1d97
> --- /dev/null
> +++ b/tools/dtrace/dt_utils.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +#include "dtrace_impl.h"
> +
> +#define BUF_SIZE 1024 /* max size for online cpu data */
> +
> +int dt_numcpus; /* number of online CPUs */
> +int dt_maxcpuid; /* highest CPU id */
> +int *dt_cpuids; /* list of CPU ids */
> +
> +/*
> + * Populate the online CPU id information from sysfs data. We only do this
> + * once because we do not care about CPUs coming online after we started
> + * tracing. If a CPU goes offline during tracing, we do not care either
> + * because that simply means that it won't be writing any new probe data into
> + * its buffer.
> + */
> +void cpu_list_populate(void)
> +{
> + char buf[BUF_SIZE];
> + int fd, cnt, start, end, i;
> + int *cpu;
> + char *p, *q;
> +
> + fd = open("/sys/devices/system/cpu/online", O_RDONLY);
> + if (fd < 0)
> + goto fail;
> + cnt = read(fd, buf, sizeof(buf));
> + close(fd);
> + if (cnt <= 0)
> + goto fail;
> +
> + /*
> + * The string should always end with a newline, but let's make sure.
> + */
> + if (buf[cnt - 1] == '\n')
> + buf[--cnt] = 0;
> +
> + /*
> + * Count how many CPUs we have.
> + */
> + dt_numcpus = 0;
> + p = buf;
> + do {
> + start = (int)strtol(p, &q, 10);
> + switch (*q) {
> + case '-': /* range */
> + p = q + 1;
> + end = (int)strtol(p, &q, 10);
> + dt_numcpus += end - start + 1;
> + if (*q == 0) { /* end of string */
> + p = q;
> + break;
> + }
> + if (*q != ',')
> + goto fail;
> + p = q + 1;
> + break;
> + case 0: /* end of string */
> + dt_numcpus++;
> + p = q;
> + break;
> + case ',': /* gap */
> + dt_numcpus++;
> + p = q + 1;
> + break;
> + }
> + } while (*p != 0);
> +
> + dt_cpuids = calloc(dt_numcpus, sizeof(int));
> + cpu = dt_cpuids;
> +
> + /*
> + * Fill in the CPU ids.
> + */
> + p = buf;
> + do {
> + start = (int)strtol(p, &q, 10);
> + switch (*q) {
> + case '-': /* range */
> + p = q + 1;
> + end = (int)strtol(p, &q, 10);
> + for (i = start; i <= end; i++)
> + *cpu++ = i;
> + if (*q == 0) { /* end of string */
> + p = q;
> + break;
> + }
> + if (*q != ',')
> + goto fail;
> + p = q + 1;
> + break;
> + case 0: /* end of string */
> + *cpu = start;
> + p = q;
> + break;
> + case ',': /* gap */
> + *cpu++ = start;
> + p = q + 1;
> + break;
> + }
> + } while (*p != 0);
> +
> + /* Record the highest CPU id of the set of online CPUs. */
> + dt_maxcpuid = *(cpu - 1);
> +
> + return;
> +fail:
> + if (dt_cpuids)
> + free(dt_cpuids);
> +
> + dt_numcpus = 0;
> + dt_maxcpuid = 0;
> + dt_cpuids = NULL;
> +}
> +
> +void cpu_list_free(void)
> +{
> + free(dt_cpuids);
> + dt_numcpus = 0;
> + dt_maxcpuid = 0;
> + dt_cpuids = NULL;
> +}
> diff --git a/tools/dtrace/dtrace.c b/tools/dtrace/dtrace.c
> new file mode 100644
> index 000000000000..36ad526c1cd4
> --- /dev/null
> +++ b/tools/dtrace/dtrace.c
> @@ -0,0 +1,249 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#include <errno.h>
> +#include <libgen.h>
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <linux/log2.h>
> +
> +#include "dtrace_impl.h"
> +
> +#define DTRACE_BUFSIZE 32 /* default buffer size (in pages) */
> +
> +#define DMODE_VERS 0 /* display version information (-V) */
> +#define DMODE_LIST 1 /* list probes (-l) */
> +#define DMODE_EXEC 2 /* compile program and start tracing */
> +
> +#define E_SUCCESS 0
> +#define E_ERROR 1
> +#define E_USAGE 2
> +
> +#define NUM_PAGES(sz) (((sz) + getpagesize() - 1) / getpagesize())
> +
> +static const char *dtrace_options = "+b:ls:V";
> +
> +static char *g_pname;
> +static int g_mode = DMODE_EXEC;
> +
> +static int usage(void)
> +{
> + fprintf(stderr, "Usage: %s [-lV] [-b bufsz] -s script\n", g_pname);
> + fprintf(stderr,
> + "\t-b set trace buffer size\n"
> + "\t-l list probes matching specified criteria\n"
> + "\t-s enable or list probes for the specified BPF program\n"
> + "\t-V report DTrace API version\n");
> +
> + return E_USAGE;
> +}
> +
> +static u64 parse_size(const char *arg)
> +{
> + long long mul = 1;
> + long long neg, val;
> + size_t len;
> + char *end;
> +
> + if (!arg)
> + return -1;
> +
> + len = strlen(arg);
> + if (!len)
> + return -1;
> +
> + switch (arg[len - 1]) {
> + case 't':
> + case 'T':
> + mul *= 1024;
> + /* fall-through */
> + case 'g':
> + case 'G':
> + mul *= 1024;
> + /* fall-through */
> + case 'm':
> + case 'M':
> + mul *= 1024;
> + /* fall-through */
> + case 'k':
> + case 'K':
> + mul *= 1024;
> + /* fall-through */
> + default:
> + break;
> + }
> +
> + neg = strtoll(arg, NULL, 0);
> + errno = 0;
> + val = strtoull(arg, &end, 0) * mul;
> +
> + if ((mul > 1 && end != &arg[len - 1]) || (mul == 1 && *end != '\0') ||
> + val < 0 || neg < 0 || errno != 0)
> + return -1;
> +
> + return val;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + int i;
> + int modec = 0;
> + int bufsize = DTRACE_BUFSIZE;
> + int epoll_fd;
> + int cnt;
> + char **prgv;
> + int prgc;
> +
> + g_pname = basename(argv[0]);
> +
> + if (argc == 1)
> + return usage();
> +
> + prgc = 0;
> + prgv = calloc(argc, sizeof(char *));
> + if (!prgv) {
> + fprintf(stderr, "failed to allocate memory for arguments: %s\n",
> + strerror(errno));
> + return E_ERROR;
> + }
> +
> + argv[0] = g_pname; /* argv[0] for getopt errors */
> +
> + for (optind = 1; optind < argc; optind++) {
> + int opt;
> +
> + while ((opt = getopt(argc, argv, dtrace_options)) != EOF) {
> + u64 val;
> +
> + switch (opt) {
> + case 'b':
> + val = parse_size(optarg);
> + if (val < 0) {
> + fprintf(stderr, "invalid: -b %s\n",
> + optarg);
> + return E_ERROR;
> + }
> +
> + /*
> + * Bufsize needs to be a number of pages, and
> + * must be a power of 2. This is required by
> + * the perf event buffer code.
> + */
> + bufsize = roundup_pow_of_two(NUM_PAGES(val));
> + if ((u64)bufsize * getpagesize() > val)
> + fprintf(stderr,
> + "bufsize increased to %ld\n",
> + (u64)bufsize * getpagesize());
> +
> + break;
> + case 'l':
> + g_mode = DMODE_LIST;
> + modec++;
> + break;
> + case 's':
> + prgv[prgc++] = optarg;
> + break;
> + case 'V':
> + g_mode = DMODE_VERS;
> + modec++;
> + break;
> + default:
> + if (strchr(dtrace_options, opt) == NULL)
> + return usage();
> + }
> + }
> +
> + if (optind < argc) {
> + fprintf(stderr, "unknown option '%s'\n", argv[optind]);
> + return E_ERROR;
> + }
> + }
> +
> + if (modec > 1) {
> + fprintf(stderr,
> + "only one of [-lV] can be specified at a time\n");
> + return E_USAGE;
> + }
> +
> + /*
> + * We handle requests for version information first because we do not
> + * need probe information for it.
> + */
> + if (g_mode == DMODE_VERS) {
> + printf("%s\n"
> + "This is DTrace %s\n"
> + "dtrace(1) version-control ID: %s\n",
> + DT_VERS_STRING, DT_VERSION, DT_GIT_VERSION);
> +
> + return E_SUCCESS;
> + }
> +
> + /* Initialize probes. */
> + if (dt_probe_init() < 0) {
> + fprintf(stderr, "failed to initialize probes: %s\n",
> + strerror(errno));
> + return E_ERROR;
> + }
> +
> + /*
> + * We handle requests to list probes next.
> + */
> + if (g_mode == DMODE_LIST) {
> + int rc = 0;
> +
> + printf("%5s %10s %17s %33s %s\n",
> + "ID", "PROVIDER", "MODULE", "FUNCTION", "NAME");
> + for (i = 0; i < prgc; i++) {
> + rc = dt_bpf_list_probes(prgv[i]);
> + if (rc < 0)
> + fprintf(stderr, "failed to load %s: %s\n",
> + prgv[i], strerror(errno));
> + }
> +
> + return rc ? E_ERROR : E_SUCCESS;
> + }
> +
> + if (!prgc) {
> + fprintf(stderr, "missing BPF program(s)\n");
> + return E_ERROR;
> + }
> +
> + /* Process the BPF program. */
> + for (i = 0; i < prgc; i++) {
> + int err;
> +
> + err = dt_bpf_load_file(prgv[i]);
> + if (err) {
> + errno = -err;
> + fprintf(stderr, "failed to load %s: %s\n",
> + prgv[i], strerror(errno));
> + return E_ERROR;
> + }
> + }
> +
> + /* Get the list of online CPUs. */
> + cpu_list_populate();
> +
> + /* Initialize buffers. */
> + epoll_fd = dt_buffer_init(bufsize);
> + if (epoll_fd < 0) {
> + errno = -epoll_fd;
> + fprintf(stderr, "failed to allocate buffers: %s\n",
> + strerror(errno));
> + return E_ERROR;
> + }
> +
> + /* Process probe data. */
> + printf("%3s %6s\n", "CPU", "ID");
> + do {
> + cnt = dt_buffer_poll(epoll_fd, 100);
> + } while (cnt >= 0);
> +
> + dt_buffer_exit(epoll_fd);
> +
> + return E_SUCCESS;
> +}
> diff --git a/tools/dtrace/dtrace.h b/tools/dtrace/dtrace.h
> new file mode 100644
> index 000000000000..c79398432d17
> --- /dev/null
> +++ b/tools/dtrace/dtrace.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#ifndef _UAPI_LINUX_DTRACE_H
> +#define _UAPI_LINUX_DTRACE_H
> +
> +struct dt_bpf_context {
> + u32 probe_id;
> + u64 argv[10];
> +};
> +
> +#endif /* _UAPI_LINUX_DTRACE_H */
> diff --git a/tools/dtrace/dtrace_impl.h b/tools/dtrace/dtrace_impl.h
> new file mode 100644
> index 000000000000..9aa51b4c4aee
> --- /dev/null
> +++ b/tools/dtrace/dtrace_impl.h
> @@ -0,0 +1,101 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
> + */
> +#ifndef _DTRACE_H
> +#define _DTRACE_H
> +
> +#include <unistd.h>
> +#include <bpf/libbpf.h>
> +#include <linux/types.h>
> +#include <linux/ptrace.h>
> +#include <linux/perf_event.h>
> +#include <sys/syscall.h>
> +
> +#include "dtrace.h"
> +
> +#define DT_DEBUG
> +
> +#define DT_VERS_STRING "Oracle D 2.0.0"
> +
> +#define TRACEFS "/sys/kernel/debug/tracing/"
> +#define EVENTSFS TRACEFS "events/"
> +
> +extern int dt_numcpus;
> +extern int dt_maxcpuid;
> +extern int *dt_cpuids;
> +
> +extern void cpu_list_populate(void);
> +extern void cpu_list_free(void);
> +
> +struct dt_provider {
> + char *name;
> + int (*populate)(void);
> + struct dt_probe *(*resolve_event)(const char *name);
> + int (*attach)(const char *name, int bpf_fd);
> +};
> +
> +extern struct dt_provider dt_fbt;
> +extern struct dt_provider dt_syscall;
> +
> +struct dt_hentry {
> + struct dt_probe *next;
> + struct dt_probe *prev;
> +};
> +
> +struct dt_htab;
> +
> +typedef u32 (*dt_hval_fn)(const struct dt_probe *);
> +typedef int (*dt_cmp_fn)(const struct dt_probe *, const struct dt_probe *);
> +typedef struct dt_probe *(*dt_add_fn)(struct dt_probe *, struct dt_probe *);
> +typedef struct dt_probe *(*dt_del_fn)(struct dt_probe *, struct dt_probe *);
> +
> +extern struct dt_htab *dt_htab_new(dt_hval_fn hval, dt_cmp_fn cmp,
> + dt_add_fn add, dt_del_fn del);
> +extern int dt_htab_add(struct dt_htab *htab, struct dt_probe *probe);
> +extern struct dt_probe *dt_htab_lookup(const struct dt_htab *htab,
> + const struct dt_probe *probe);
> +extern int dt_htab_del(struct dt_htab *htab, struct dt_probe *probe);
> +
> +struct dt_probe {
> + u32 id;
> + int event_fd;
> + const struct dt_provider *prov;
> + const char *prv_name; /* provider name */
> + const char *mod_name; /* module name */
> + const char *fun_name; /* function name */
> + const char *prb_name; /* probe name */
> + struct dt_hentry he_fqn;
> +};
> +
> +typedef void (*dt_probe_fn)(const struct dt_probe *probe);
> +
> +extern int dt_probe_init(void);
> +extern int dt_probe_new(const struct dt_provider *prov, const char *pname,
> + const char *mname, const char *fname, const char *name);
> +extern struct dt_probe *dt_probe_by_name(const struct dt_probe *tmpl);
> +extern struct dt_probe *dt_probe_resolve_event(const char *name);
> +
> +extern int dt_bpf_list_probes(const char *fn);
> +extern int dt_bpf_load_file(const char *fn);
> +extern int dt_bpf_map_update(int fd, const void *key, const void *val);
> +extern int dt_bpf_attach(int event_id, int bpf_fd);
> +
> +extern int dt_bufmap_fd;
> +
> +extern int dt_buffer_init(int num_pages);
> +extern int dt_buffer_poll(int epoll_fd, int timeout);
> +extern void dt_buffer_exit(int epoll_fd);
> +
> +static inline int perf_event_open(struct perf_event_attr *attr, pid_t pid,
> + int cpu, int group_fd, unsigned long flags)
> +{
> + return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
> +}
> +
> +extern inline int bpf(enum bpf_cmd cmd, union bpf_attr *attr)
> +{
> + return syscall(__NR_bpf, cmd, attr, sizeof(union bpf_attr));
> +}
> +
> +#endif /* _DTRACE_H */
> --
> 2.20.1
Powered by blists - more mailing lists