[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m2a5lpha4m.fsf@gmail.com>
Date: Fri, 19 Apr 2024 12:20:25 +0100
From: Donald Hunter <donald.hunter@...il.com>
To: Pablo Neira Ayuso <pablo@...filter.org>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>, "David S.
Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Paolo
Abeni <pabeni@...hat.com>, Jiri Pirko <jiri@...nulli.us>, Jacob Keller
<jacob.e.keller@...el.com>, Jozsef Kadlecsik <kadlec@...filter.org>,
netfilter-devel@...r.kernel.org, coreteam@...filter.org,
donald.hunter@...hat.com
Subject: Re: [PATCH net-next v4 4/4] netfilter: nfnetlink: Handle ACK flags
for batch messages
Pablo Neira Ayuso <pablo@...filter.org> writes:
> Hi Donald,
>
> Apologies for a bit late jumping back on this, it took me a while.
>
> On Thu, Apr 18, 2024 at 11:47:37AM +0100, Donald Hunter wrote:
>> The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
>> messages. This is a problem for ynl which wants to receive an ack for
>> every message it sends, not just the commands in between the begin/end
>> messages.
>
> Just a side note: Turning on NLM_F_ACK for every message fills up the
> receiver buffer very quickly, leading to ENOBUFS. Netlink, in general,
> supports batching (with non-atomic semantics), I did not look at ynl
> in detail, if it does send() + recv() for each message for other
> subsystem then fine, but if it uses batching to amortize the cost of
> the syscall then this explicit ACK could be an issue with very large
> batches.
ynl is batching the messages for send() and will accept batches from
recv() but nfnetlink is sending each ack separately. It is using
netlink_ack() which uses a new skb for each message, for example:
sudo strace -e sendto,recvfrom ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/nftables.yaml \
--multi batch-begin '{"res-id": 10}' \
--multi newtable '{"name": "test", "nfgen-family": 1}' \
--multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
--multi batch-end '{"res-id": 10}'
sendto(3, [[{nlmsg_len=20, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28254, nlmsg_pid=0}, "\x00\x00\x00\x0a"], [{nlmsg_len=32, nlmsg_type=0xa00 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28255, nlmsg_pid=0}, "\x01\x00\x00\x00\x09\x00\x01\x00\x74\x65\x73\x74\x00\x00\x00\x00"], [{nlmsg_len=44, nlmsg_type=0xa03 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28256, nlmsg_pid=0}, "\x01\x00\x00\x00\x0a\x00\x03\x00\x63\x68\x61\x69\x6e\x00\x00\x00\x09\x00\x01\x00\x74\x65\x73\x74\x00\x00\x00\x00"], [{nlmsg_len=20, nlmsg_type=0x11 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28257, nlmsg_pid=0}, "\x00\x00\x00\x0a"]], 116, 0, NULL, 0) = 116
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28254, nlmsg_pid=997}, {error=0, msg={nlmsg_len=20, nlmsg_type=NFNL_MSG_BATCH_BEGIN, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28254, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28255, nlmsg_pid=997}, {error=0, msg={nlmsg_len=32, nlmsg_type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWTABLE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28255, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28256, nlmsg_pid=997}, {error=0, msg={nlmsg_len=44, nlmsg_type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWCHAIN, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28256, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28257, nlmsg_pid=997}, {error=0, msg={nlmsg_len=20, nlmsg_type=NFNL_MSG_BATCH_END, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28257, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
> Out of curiosity: Why does the tool need an explicit ack for each
> command? As mentioned above, this consumes a lot netlink bandwidth.
For the ynl python library, I guess it was a design choice to request an
ack for each command.
Since the Netlink API allows a user to request acks, it seems necessary
to be consistent about providing them. Otherwise we'd need to extend the
netlink message specs to say which messages are ack capable and which
are not.
>> Add processing for ACKs for begin/end messages and provide responses
>> when requested.
>>
>> I have checked that iproute2, pyroute2 and systemd are unaffected by
>> this change since none of them use NLM_F_ACK for batch begin/end.
>
> nitpick: Quick grep shows me iproute2 does not use the nfnetlink
> subsystem? If I am correct, maybe remove this.
Yeah, my mistake. I did check iproute2 but didn't mean to add it to the
list. For nft, NFNL_MSG_BATCH_* usage is contained in libnftnl from what
I can see. I'll update the patch.
> Thanks!
>
> One more comment below.
Did you miss adding a comment?
>
>> Signed-off-by: Donald Hunter <donald.hunter@...il.com>
>> ---
>> net/netfilter/nfnetlink.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
>> index c9fbe0f707b5..4abf660c7baf 100644
>> --- a/net/netfilter/nfnetlink.c
>> +++ b/net/netfilter/nfnetlink.c
>> @@ -427,6 +427,9 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
>>
>> nfnl_unlock(subsys_id);
>>
>> + if (nlh->nlmsg_flags & NLM_F_ACK)
>> + nfnl_err_add(&err_list, nlh, 0, &extack);
>> +
>> while (skb->len >= nlmsg_total_size(0)) {
>> int msglen, type;
>>
>> @@ -573,6 +576,8 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
>> } else if (err) {
>> ss->abort(net, oskb, NFNL_ABORT_NONE);
>> netlink_ack(oskb, nlmsg_hdr(oskb), err, NULL);
>> + } else if (nlh->nlmsg_flags & NLM_F_ACK) {
>> + nfnl_err_add(&err_list, nlh, 0, &extack);
>> }
>> } else {
>> enum nfnl_abort_action abort_action;
>> --
>> 2.44.0
>>
Powered by blists - more mailing lists