lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJ_rd_vUH1LPbby5vV=s=jWdpzvDKnm6H1YK=wRPWBiyw@mail.gmail.com>
Date: Fri, 31 May 2024 19:32:40 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Yan Zhai <yan@...udflare.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	David Ahern <dsahern@...nel.org>, Abhishek Chauhan <quic_abchauha@...cinc.com>, 
	Mina Almasry <almasrymina@...gle.com>, Florian Westphal <fw@...len.de>, 
	Alexander Lobakin <aleksander.lobakin@...el.com>, David Howells <dhowells@...hat.com>, 
	Jiri Pirko <jiri@...nulli.us>, Daniel Borkmann <daniel@...earbox.net>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Lorenzo Bianconi <lorenzo@...nel.org>, 
	Pavel Begunkov <asml.silence@...il.com>, linux-kernel@...r.kernel.org, 
	kernel-team@...udflare.com, Jesper Dangaard Brouer <hawk@...nel.org>
Subject: Re: [RFC net-next 1/6] net: add kfree_skb_for_sk function

On Fri, May 31, 2024 at 6:58 PM Yan Zhai <yan@...udflare.com> wrote:
>
> Hi Eric,
>
>  Thanks for the feedback.
>
> On Fri, May 31, 2024 at 1:51 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Thu, May 30, 2024 at 11:46 PM Yan Zhai <yan@...udflare.com> wrote:
> > >
> > > Implement a new kfree_skb_for_sk to replace kfree_skb_reason on a few
> > > local receive path. The function accepts an extra receiving socket
> > > argument, which will be set in skb->cb for kfree_skb/consume_skb
> > > tracepoint consumption. With this extra bit of information, it will be
> > > easier to attribute dropped packets to netns/containers and
> > > sockets/services for performance and error monitoring purposes.
> >
> > This is a lot of code churn...
> >
> > I have to ask : Why not simply adding an sk parameter to an existing
> > trace point ?
> >
> Modifying a signature of the current tracepoint seems like a breaking
> change, that's why I was saving the context inside skb->cb, hoping to
> not impact any existing programs watching this tracepoint. But
> thinking it twice, it might not cause a problem if the signature
> becomes:
>
>  trace_kfree_skb(const struct sk_buff *skb, void *location, enum
> skb_drop_reason reason, const struct sock *sk)
>
> As return values are usually not a thing for tracepoints, it is
> probably still compatible. The cons is that the last "sk" still breaks
> the integrity of naming. How about making a "kfree_skb_context"
> internal struct and putting it as the last argument to "hide" the
> naming confusion?
>
> > If this not possible, I would rather add new tracepoints, adding new classes,
> > because it will ease your debugging :
> >
> > When looking for TCP drops, simply use a tcp_event_sk_skb_reason instance,
> > and voila, no distractions caused by RAW/ICMP/ICMPv6/af_packet drops.
> >
> > DECLARE_EVENT_CLASS(tcp_event_sk_skb_reason,
> >
> >      TP_PROTO(const struct sock *sk, const struct sk_buff *skb, enum
> > skb_drop_reason reason),
> > ...
> > );
>
> The alternative of adding another tracepoint could indeed work, we had
> a few cases like that in the past, e.g.
>
> https://lore.kernel.org/lkml/20230711043453.64095-1-ivan@cloudflare.com/
> https://lore.kernel.org/netdev/20230707043923.35578-1-ivan@cloudflare.com/
>
> But it does feel like a whack-a-mole thing. The problems are solvable
> if we extend the kfree_skb tracepoint, so I would prefer to not add a
> new tracepoint.

Solvable with many future merge conflicts for stable teams.


>
> >
> > Also, the name ( kfree_skb_for_sk) and order of parameters is confusing.
> >
> > I always prefer this kind of ordering/names :
> >
> > void sk_skb_reason_drop( [struct net *net ] // not relevant here, but
> > to expand the rationale
> >               struct sock *sk, struct sk_buff *skb, enum skb_drop_reason reason)
> >
> > Looking at the name, we immediately see the parameter order.
> >
> > The consume one (no @reason there) would be called
> >
> > void sk_skb_consume(struct sock *sk, struct sk_buff *skb);
>
> I was intending to keep the "kfree_skb" prefix initially since it
> would appear less surprising to kernel developers who used kfree_skb
> and kfree_skb_reason. But your points do make good sense. How about
> "kfree_sk_skb_reason" and "consume_sk_skb" here?
>

IMO kfree_skb() and consume_skb() were a wrong choice. We have to live
with them.

It should have been skb_free(), skb_consume(), skb_alloc(),
to be consistent.

Following (partial) list was much better:

skb_add_rx_frag_netmem, skb_coalesce_rx_frag, skb_pp_cow_data,
skb_cow_data_for_xdp,
skb_dump, skb_tx_error, skb_morph, skb_zerocopy_iter_stream, skb_copy_ubufs,
skb_clone, skb_headers_offset_update, skb_copy_header, skb_copy,
skb_realloc_headroom, skb_expand_head, skb_copy_expand, skb_put,
skb_push, skb_pull, skb_pull_data, skb_trim, skb_copy_bits,
skb_splice_bits, skb_send_sock_locked, skb_store_bits,
skb_checksum, skb_copy_and_csum_bits, skb_zerocopy_headlen,
skb_zerocopy, skb_copy_and_csum_dev, skb_dequeue,
skb_dequeue_tail, skb_queue_purge_reason, skb_errqueue_purge,
skb_queue_head, skb_queue_tail, skb_unlink, skb_append,
skb_split, skb_prepare_seq_read, skb_seq_read, skb_abort_seq_read,
skb_find_text, skb_append_pagefrags, skb_pull_rcsum, skb_segment_list,
skb_segment, skb_to_sgvec, skb_to_sgvec_nomark, skb_cow_data, skb_clone_sk,
skb_complete_tx_timestamp, skb_tstamp_tx, skb_complete_wifi_ack,
skb_partial_csum_set, skb_checksum_setup, skb_checksum_trimmed,
skb_try_coalesce, skb_scrub_packet, skb_vlan_untag, skb_ensure_writable,
skb_ensure_writable_head_tail, skb_vlan_pop, skb_vlan_push, skb_eth_pop,
skb_eth_push, skb_mpls_push, skb_mpls_pop, skb_mpls_update_lse,
skb_mpls_dec_ttl, skb_condense, skb_ext_add, skb_splice_from_iter

(just to make my point very very clear)

Instead we have a myriad of functions with illogical parameter
ordering vs their names.

I see no reason to add more confusion for new helpers.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ