netdev - Re: [PATCH net-next] bridge: add tracepoint in br_fdb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 31 Aug 2017 18:20:12 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     David Ahern <dsahern@...il.com>
Cc:     Roopa Prabhu <roopa@...ulusnetworks.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        bridge@...ts.linux-foundation.org, brouer@...hat.com,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH net-next] bridge: add tracepoint in br_fdb_update

On Thu, 31 Aug 2017 09:30:05 -0600
David Ahern <dsahern@...il.com> wrote:

> On 8/31/17 9:21 AM, Roopa Prabhu wrote:
> > On Thu, Aug 31, 2017 at 5:38 AM, Jesper Dangaard Brouer
> > <brouer@...hat.com> wrote:  
> >> On Wed, 30 Aug 2017 22:18:13 -0700
> >> Roopa Prabhu <roopa@...ulusnetworks.com> wrote:
> >>  
> >>> From: Roopa Prabhu <roopa@...ulusnetworks.com>
> >>>
> >>> This extends bridge fdb table tracepoints to also cover
> >>> learned fdb entries in the br_fdb_update path. Note that
> >>> unlike other tracepoints I have moved this to when the fdb
> >>> is modified because this is in the datapath and can generate
> >>> a lot of noise in the trace output. br_fdb_update is also called
> >>> from added_by_user context in the NTF_USE case which is already
> >>> traced ..hence the !added_by_user check.
> >>>
> >>> Signed-off-by: Roopa Prabhu <roopa@...ulusnetworks.com>
> >>> ---
> >>>  include/trace/events/bridge.h | 31 +++++++++++++++++++++++++++++++
> >>>  net/bridge/br_fdb.c           |  5 ++++-
> >>>  net/core/net-traces.c         |  1 +
> >>>  3 files changed, 36 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
> >>> index 0f1cde0..1bee3e7 100644
> >>> --- a/include/trace/events/bridge.h
> >>> +++ b/include/trace/events/bridge.h
> >>> @@ -92,6 +92,37 @@ TRACE_EVENT(fdb_delete,
> >>>                 __entry->addr[4], __entry->addr[5], __entry->vid)
> >>>  );
> >>>
> >>> +TRACE_EVENT(br_fdb_update,
> >>> +
> >>> +     TP_PROTO(struct net_bridge *br, struct net_bridge_port *source,
> >>> +              const unsigned char *addr, u16 vid, bool added_by_user),
> >>> +
> >>> +     TP_ARGS(br, source, addr, vid, added_by_user),
> >>> +
> >>> +     TP_STRUCT__entry(
> >>> +             __string(br_dev, br->dev->name)
> >>> +             __string(dev, source->dev->name)  
> >>
> >> I have found that using the device string name is
> >>
> >> (1) slow as it involves strcpy+strlen
> >>
> >>  See [1]+[2] where a single dev-name costed me 16 ns, and the base
> >>  overhead of a bpf attached tracepoint is 25 ns (see [3]).
> >>
> >>  [1] https://git.kernel.org/davem/net-next/c/e7d12ce121a
> >>  [2] https://git.kernel.org/davem/net-next/c/315ec3990ef
> >>  [3] https://git.kernel.org/davem/net-next/c/25d4dae1a64
> >>
> >> (2) strings are also harder to work-with/extract when attaching a bpf_prog
> >>
> >> See the trouble I'm in accessing a dev string here napi:napi_poll here:
> >>  https://github.com/netoptimizer/prototype-kernel/blob/103b955a080/kernel/samples/bpf/napi_monitor_kern.c#L52-L58
> >>
> >> Using ifindex'es in userspace is fairly easy see man if_indextoname(3).
> >>  
> > 
> > Jesper thanks for the data!. GTK. Looking at include/trace/events,
> > currently almost all tracepoints use dev->name.

True, but with my recent experience and benchmarking, I consider this
generally a bad choice we have made for all these tracepoints.  In your
case with 2 strings, 2x16=32ns, you basically introduced a overhead
that is larger that to invocation cost.

> > These bridge tracepoints in context are primarily for debugging fdb
> > updates only, not for every packet and hence not in the performance
> > path.
> > In large scale deployments with thousands of bridge ports and fdb
> > entries, dev->name will definately make it easier to trouble-shoot.
> > So, I did like to leave these with dev->name unless there are strong objections.  
> 
> +1 for user friendliness for debugging tracepoints. The device name is
> also more user friendly when adding filters to the data collection.
>
> Being able to add bpf everywhere certainly changes the game a bit, but
> we should not relinquish ease of use and understanding for the potential
> that someone might want to put a bpf program on the tracepoint and want
> to maintain high performance.

(Cc. Acme and Peterz)
I wonder if we can create a special perf-tracepoint type for ifindex'es
and the tool reading (e.g. perf-script) can perform the name lookup in
userspace (calling if_indextoname(3)) ?

I don't know the perf tools well enough to know if this is possible?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer