[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADg4-L8qauZSuC4=a-Ut4CSmUeyZNT4sprmSxbwWkQ9q-TrRqA@mail.gmail.com>
Date: Fri, 25 Jul 2025 10:47:35 -0700
From: Christoph Paasch <cpaasch@...nai.com>
To: Ido Schimmel <idosch@...sch.org>
Cc: David Ahern <dsahern@...nel.org>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: Make nexthop-dumps scale linearly with the
number of nexthops
On Fri, Jul 25, 2025 at 7:05 AM Ido Schimmel <idosch@...sch.org> wrote:
>
> On Thu, Jul 24, 2025 at 05:10:36PM -0700, Christoph Paasch via B4 Relay wrote:
> > From: Christoph Paasch <cpaasch@...nai.com>
> >
> > When we have a (very) large number of nexthops, they do not fit within a
> > single message. rtm_dump_walk_nexthops() thus will be called repeatedly
> > and ctx->idx is used to avoid dumping the same nexthops again.
> >
> > The approach in which we avoid dumpint the same nexthops is by basically
>
> s/dumpint/dumping/
>
> > walking the entire nexthop rb-tree from the left-most node until we find
> > a node whose id is >= s_idx. That does not scale well.
> >
> > Instead of this non-efficient approach, rather go directly through the
> ^ double space
> s/non-efficient/inefficient/ ?
>
> > tree to the nexthop that should be dumped (the one whose nh_id >=
> > s_idx). This allows us to find the relevant node in O(log(n)).
> >
> > We have quite a nice improvement with this:
> >
> > Before:
> > =======
> >
> > --> ~1M nexthops:
> > $ time ~/libnl/src/nl-nh-list | wc -l
> > 1050624
> >
> > real 0m21.080s
> > user 0m0.666s
> > sys 0m20.384s
> >
> > --> ~2M nexthops:
> > $ time ~/libnl/src/nl-nh-list | wc -l
> > 2101248
> >
> > real 1m51.649s
> > user 0m1.540s
> > sys 1m49.908s
> >
> > After:
> > ======
> >
> > --> ~1M nexthops:
> > $ time ~/libnl/src/nl-nh-list | wc -l
> > 1050624
> >
> > real 0m1.157s
> > user 0m0.926s
> > sys 0m0.259s
> >
> > --> ~2M nexthops:
> > $ time ~/libnl/src/nl-nh-list | wc -l
> > 2101248
> >
> > real 0m2.763s
> > user 0m2.042s
> > sys 0m0.776s
>
> I was able to reproduce these results.
>
> >
> > Signed-off-by: Christoph Paasch <cpaasch@...nai.com>
> > ---
> > net/ipv4/nexthop.c | 34 +++++++++++++++++++++++++++++++++-
> > 1 file changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> > index 29118c43ebf5f1e91292fe227d4afde313e564bb..226447b1c17d22eab9121bed88c0c2b9148884ac 100644
> > --- a/net/ipv4/nexthop.c
> > +++ b/net/ipv4/nexthop.c
> > @@ -3511,7 +3511,39 @@ static int rtm_dump_walk_nexthops(struct sk_buff *skb,
> > int err;
> >
> > s_idx = ctx->idx;
> > - for (node = rb_first(root); node; node = rb_next(node)) {
> > +
> > + /*
> > + * If this is not the first invocation, ctx->idx will contain the id of
> > + * the last nexthop we processed. Instead of starting from the very first
> > + * element of the red/black tree again and linearly skipping the
> > + * (potentially large) set of nodes with an id smaller than s_idx, walk the
> > + * tree and find the left-most node whose id is >= s_idx. This provides an
> > + * efficient O(log n) starting point for the dump continuation.
> > + */
>
> Please try to keep lines at 80 characters.
>
> > + if (s_idx != 0) {
> > + struct rb_node *tmp = root->rb_node;
> > +
> > + node = NULL;
> > + while (tmp) {
> > + struct nexthop *nh;
> > +
> > + nh = rb_entry(tmp, struct nexthop, rb_node);
> > + if (nh->id < s_idx) {
> > + tmp = tmp->rb_right;
> > + } else {
> > + /* Track current candidate and keep looking on
> > + * the left side to find the left-most
> > + * (smallest id) that is still >= s_idx.
> > + */
>
> I'm aware that netdev now accepts both comment styles, but it's a bit
> weird to mix both in the same commit and at the same function.
>
> > + node = tmp;
> > + tmp = tmp->rb_left;
> > + }
> > + }
> > + } else {
> > + node = rb_first(root);
> > + }
> > +
> > + for (; node; node = rb_next(node)) {
> > struct nexthop *nh;
> >
> > nh = rb_entry(node, struct nexthop, rb_node);
>
> The code below is:
>
> if (nh->id < s_idx)
> continue;
>
> Can't it be removed given the above code means we start at a nexthop
> whose identifier is at least s_idx ?
Yes, we can drop this check.
Thanks for all your feedback. Will resubmit when net-next reopens.
Christoph
Powered by blists - more mailing lists