linux-kernel - Re: latency histogram with BPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <557A06D7.7030304@plumgrid.com>
Date:	Thu, 11 Jun 2015 15:08:23 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	Daniel Wagner <daniel.wagner@...-carit.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Tom Zanussi <tom.zanussi@...ux.intel.com>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Wang Nan <wangnan0@...wei.com>
Subject: Re: latency histogram with BPF

On 6/11/15 12:25 AM, Daniel Wagner wrote:
>
> In both cases BPF or based on Tom's 'hist' triggers' patches, there is
> some trickery necessary to get it working. While the first approach
> has more flexibility what you want to measure or how you want to
> present it, I suspect it will be harder to get it working/accepted.

why? Out of your patch only 2 lines are for kernel, which I wanted
to add anyway. The sample code also looks good.

> Obviously, adding kprobes to trace_preempt_[on|off] is
> asking for problems:

thanks for reporting.
I reproduced it with hackbench, but my stack trace looks completely
different. Looks like some memory corruption is happening.
Not clear where. I'm still debugging.

> CPU 3
>        latency        : count     distribution
>         1 -> 1        : 0        |                                                  |
>         2 -> 3        : 0        |                                                  |
>         4 -> 7        : 0        |                                                  |
>         8 -> 15       : 25182    |************************************************* |
>        16 -> 31       : 1675     |**                                                |
>        32 -> 63       : 0        |                                                  |
>        64 -> 127      : 0        |                                                  |
>       128 -> 255      : 0        |                                                  |
>       256 -> 511      : 0        |                                                  |
>       512 -> 1023     : 0        |                                                  |
>
> The numbers look a bit too nice and low. I suspect something is going
> wrong.

I see similar numbers. 25k over 5 sec = 5k preempt on/off per second
which sounds about right for idle.

> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 2d56ce5..e4d3e76 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -172,6 +172,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
>   		return &bpf_probe_read_proto;
>   	case BPF_FUNC_ktime_get_ns:
>   		return &bpf_ktime_get_ns_proto;
> +	case BPF_FUNC_get_smp_processor_id:
> +		return &bpf_get_smp_processor_id_proto;

This part I'll take into my set of patches.

> --- a/samples/bpf/bpf_helpers.h
> +++ b/samples/bpf/bpf_helpers.h
> @@ -21,6 +21,8 @@ static unsigned long long (*bpf_ktime_get_ns)(void) =
>   	(void *) BPF_FUNC_ktime_get_ns;
>   static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
>   	(void *) BPF_FUNC_trace_printk;
> +static unsigned int (*bpf_get_smp_processor_id)(void) =
> +	(void *) BPF_FUNC_get_smp_processor_id;

this part is already there in net-next.

> diff --git a/samples/bpf/hwlathist_kern.c b/samples/bpf/hwlathist_kern.c
> new file mode 100644
> index 0000000..7a97e7e
> --- /dev/null
> +++ b/samples/bpf/hwlathist_kern.c

looks good and useful, but I would like to find and fix the bug first
before exposing this bit as a 'sample', since right now it's
'crash me if hackbench is running' test :)
If you have any suggestions on where to look, I'm all ears.
My stack traces look like:
Running with 10*40 (== 400) tasks.
[   12.032571] kernel BUG at ../mm/slub.c:3413!
[   12.036004]  [<ffffffff810c1913>] rcu_process_callbacks+0x283/0x680
[   12.036004]  [<ffffffff810632d0>] __do_softirq+0xf0/0x2b0
[   12.036004]  [<ffffffff8107dff9>] ? kthread+0xc9/0xe0
[   12.036004]  [<ffffffff810634b1>] run_ksoftirqd+0x21/0x50
or
[   25.788074] kernel tried to execute NX-protected page - exploit 
attempt? (uid: 0)
[   25.788801] BUG: unable to handle kernel paging request at 
ffff88000b1ea2a0
[   25.800085]  [<ffffffff810c18ba>] ? rcu_process_callbacks+0x22a/0x680
[   25.800085]  [<ffffffff810632d0>] __do_softirq+0xf0/0x2b0
[   25.800085]  [<ffffffff816d6d7c>] do_softirq_own_stack+0x1c/0x30
[   25.800085]  <EOI>
[   25.800085]  [<ffffffff8106351d>] do_softirq+0x3d/0x40
[   25.800085]  [<ffffffff810635ba>] __local_bh_enable_ip+0x9a/0xb0
[   25.800085]  [<ffffffff816d4c1b>] _raw_spin_unlock_bh+0x1b/0x20
[   25.800085]  [<ffffffff811f3e42>] bdi_writeback_workfn+0x1b2/0x470

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/