netdev - Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpVJ-MT0t1qtvWBp2=twPq6GWsn5-sAW6=QVf4Gc97Mmeg@mail.gmail.com>
Date:   Fri, 22 May 2020 12:33:17 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Vlad Buslov <vladbu@...lanox.com>
Cc:     Edward Cree <ecree@...arflare.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Davide Caratti <dcaratti@...hat.com>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode

On Wed, May 20, 2020 at 12:24 AM Vlad Buslov <vladbu@...lanox.com> wrote:
>
>
> On Tue 19 May 2020 at 21:58, Cong Wang <xiyou.wangcong@...il.com> wrote:
> > On Tue, May 19, 2020 at 2:04 AM Vlad Buslov <vladbu@...lanox.com> wrote:
> >> I considered that approach initially but decided against it for
> >> following reasons:
> >>
> >> - Generic data is covered by current terse dump implementation.
> >>   Everything else will be act or cls specific which would result long
> >>   list of flag values like: TCA_DUMP_FLOWER_KEY_ETH_DST,
> >>   TCA_DUMP_FLOWER_KEY_ETH_DST, TCA_DUMP_FLOWER_KEY_VLAN_ID, ...,
> >>   TCA_DUMP_TUNNEL_KEY_ENC_KEY_ID, TCA_DUMP_TUNNEL_KEY_ENC_TOS. All of
> >>   these would require a lot of dedicated logic in act and cls dump
> >>   callbacks. Also, it would be quite a challenge to test all possible
> >>   combinations.
> >
> > Well, if you consider netlink dump as a database query, what Edward
> > proposed is merely "select COLUMN1 COLUMN2 from cls_db" rather
> > than "select * from cls_db".
> >
> > No one said it is easy to implement, it is just more elegant than you
> > select a hardcoded set of columns for the user.
>
> As I explained to Edward, having denser netlink packets with more
> filters per packet is only part of optimization. Another part is not
> executing some code at all. Consider fl_dump_key() which is 200 lines
> function with bunch of conditionals like that:
>
> static int fl_dump_key(struct sk_buff *skb, struct net *net,
>                        struct fl_flow_key *key, struct fl_flow_key *mask)
> {
>         if (mask->meta.ingress_ifindex) {
>                 struct net_device *dev;
>
>                 dev = __dev_get_by_index(net, key->meta.ingress_ifindex);
>                 if (dev && nla_put_string(skb, TCA_FLOWER_INDEV, dev->name))
>                         goto nla_put_failure;
>         }
>
>         if (fl_dump_key_val(skb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
>                             mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
>                             sizeof(key->eth.dst)) ||
>             fl_dump_key_val(skb, key->eth.src, TCA_FLOWER_KEY_ETH_SRC,
>                             mask->eth.src, TCA_FLOWER_KEY_ETH_SRC_MASK,
>                             sizeof(key->eth.src)) ||
>             fl_dump_key_val(skb, &key->basic.n_proto, TCA_FLOWER_KEY_ETH_TYPE,
>                             &mask->basic.n_proto, TCA_FLOWER_UNSPEC,
>                             sizeof(key->basic.n_proto)))
>                 goto nla_put_failure;
>
>         if (fl_dump_key_mpls(skb, &key->mpls, &mask->mpls))
>                 goto nla_put_failure;
>
>         if (fl_dump_key_vlan(skb, TCA_FLOWER_KEY_VLAN_ID,
>                              TCA_FLOWER_KEY_VLAN_PRIO, &key->vlan, &mask->vlan))
>                 goto nla_put_failure;
>     ...
>
>
> Now imagine all of these are extended with additional if (flags &
> TCA_DUMP_XXX). All gains from not outputting some other minor stuff into
> netlink packet will be negated by it.

Interesting, are you saying a bit test is as expensive as appending
an actual netlink attribution to the dumping? I am surprised.


>
>
> >
> > Think about it, what if another user wants a less terse dump but still
> > not a full dump? Would you implement ops->terse_dump2()? Or
> > what if people still think your terse dump is not as terse as she wants?
> > ops->mini_dump()? How many ops's we would end having?
>
> User can discard whatever he doesn't need in user land code. The goal of
> this change is performance optimization, not designing a generic
> kernel-space data filtering mechanism.

You optimize the performance by reducing the dump size, which is
already effectively a data filtering. This doesn't have to be your goal,
you are implementing it anyway.


>
> >
> >
> >>
> >> - It is hard to come up with proper validation for such implementation.
> >>   In case of terse dump I just return an error if classifier doesn't
> >>   implement the callback (and since current implementation only outputs
> >>   generic action info, it doesn't even require support from
> >>   action-specific dump callbacks). But, for example, how do we validate
> >>   a case where user sets some flower and tunnel_key act dump flags from
> >>   previous paragraph, but Qdisc contains some other classifier? Or
> >>   flower classifier points to other types of actions? Or when flower
> >>   classifier has and tunnel_key actions but also mirred? Should the
> >
> > Each action should be able to dump selectively too. If you think it
> > as a database, it is just a different table with different schemas.
>
> How is designing custom SQL-like query language (according to your
> example at the beginning of the mail) for filter dump is going to
> improve performance? If there is a way to do it in fast a generic manner
> with BPF, then I'm very interested to hear the details. But adding
> hundred more hardcoded conditionals is just not a solution considering
> main motivations for this change is performance.

I still wonder how a bit test is as expensive as you claim, it does
not look like you actually measure it. This of course depends on the
size of the dump, but if you look at other netlink dump in kernel,
not just tc filters, we already dump a lot of attributes per record.

Thanks.