[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1492609668.10587.164.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Wed, 19 Apr 2017 06:47:48 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: davem@...emloft.net, netdev@...r.kernel.org, jiri@...nulli.us,
xiyou.wangcong@...il.com
Subject: Re: [PATCH net-next 1/2 v2] net sched actions: dump more than
TCA_ACT_MAX_PRIO actions per batch
On Wed, 2017-04-19 at 07:24 -0400, Jamal Hadi Salim wrote:
> On 17-04-18 11:17 PM, Eric Dumazet wrote:
> > On Tue, 2017-04-18 at 22:32 -0400, Jamal Hadi Salim wrote:
> >> On 17-04-18 09:49 PM, Eric Dumazet wrote:
> >>> On Tue, 2017-04-18 at 21:14 -0400, Jamal Hadi Salim wrote:
>
> >>
> >> Make sense?
> >
> > What if we have 1024 actions, and user provides a 4KB buffer ?
> >
>
> No problem - we will fit as many per batch to consume 4KB
> and will send them as long as user calls recvmsg.
>
> > Normally multiple recvmsg() calls would be needed, but I do not see how
> > the nla_put_u32(skb, TCAA_ACT_COUNT, cb->args[1]) can always succeed.
>
> Oh, I see the cross-talk Eric;->
> We dont pack the actions in TCAA_ACT_COUNT - we put them in
> TCAA_ACT_TAB.
>
> Here's some strace capture that best describes what happened before
> and after which i hope will make sense. Granted tc uses 32KB from
> user space and not 4KB you mention.
>
> We have 400 actions in the kernel at this point:
>
> tc with no changes (doesnt for this large dump):
> ---
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"\240\16\0\0002\0\2\0^6\367XE\3\0\0\0\0\0\0\204\16\1\0\200\0\0\0\t\0\1\0"...,
> 32768,g}], msg_controllen=0, msg_flags=0}, 0) = 3744
>
> ===> total actions dumped 29
>
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"
> \20\0\0002\0\2\0s6\367XI\3\0\0\0\0\0\0\4\20\1\0\200\0\0\0\t\0\1\0"...,
> 32768}], msg_controllen=0, msg_flags=0}, 0) = 4128
>
> ==> total actions dumped 32
>
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"
> \20\0\0002\0\2\0s6\367XI\3\0\0\0\0\0\0\4\20\1\0\200\0\0\0\t\0\1\0"...,
> 32768}], msg_controllen=0, msg_flags=0}, 0) = 4128
>
> ==> total actions dumped 32
> ....
> .....
> .........
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"\24\0\0\0\3\0\2\0s6\367XI\3\0\0\0\0\0\0", 32768}],
> msg_controllen=0, msg_flags=0}, 0) = 20
> -----
>
> Goes on a few times until we get all 400 entries - last recvmsg (with
> 20B) has no actions and indicates dump is complete.
> Issue: The kernel is refusing to add more than 32 entries in the skb
> even though we get allocated 32KB for the skb.
>
> So now lets see what happens with this change:
> ------
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"\240\16\0\0002\0\2\0^6\367XE\3\0\0\0\0\0\0\204\16\1\0\200\0\0\0\t\0\1\0"...,
> 32768,g}], msg_controllen=0, msg_flags=0}, 0) = 3744
>
> ==> total actions dumped 29
>
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"\240~\0\0002\0\2\0^6\367XE\3\0\0\0\0\0\0\204~\1\0\200\0\0\0\t\0\1\0"...,
> 32768,g}], msg_controllen=0, msg_flags=0}, 0) = 32416
>
> ==> total actions dumped 253
>
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"
> ;\0\0002\0\2\0^6\367XE\3\0\0\0\0\0\0\4;\1\0\200\0\0\0\t\0\1\0"...,
> 32768,g}], msg_controllen=0, msg_flags=0}, 0) = 15136
>
> ==> total actions dumped 118
>
> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000},
> msg_iov(1)=[{"\24\0\0\0\3\0\2\0s6\367XI\3\0\0\0\0\0\0", 32768}],
> msg_controllen=0, msg_flags=0}, 0) = 20
>
> --------
>
> We got all 400 in 3 requests. Imagine what we have to deal with
> when we have 2M actions (the improvement is about 10x).
Just because your strace works with the size of 32768 does not mean it
will work in the future.
Please read again my question.
Try to _not_ use 32768 bytes for the recvmsg() sizes, but 4KB
You pack XXX actions until 4KB skb is full.
Then code does :
nla_put_u32(skb, TCAA_ACT_COUNT, cb->args[1])
This might fail, then you
goto out_module_put;
Then we are stuck ?
What am I missing ?
Powered by blists - more mailing lists