lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 26 Oct 2012 13:11:54 +0200 From: Eric Dumazet <eric.dumazet@...il.com> To: David Miller <davem@...emloft.net> Cc: netdev@...r.kernel.org, Stephen Hemminger <shemminger@...tta.com>, Patrick McHardy <kaber@...sh.net>, Paolo Valente <paolo.valente@...more.it>, Jamal Hadi Salim <jhs@...atatu.com>, Herbert Xu <herbert@...dor.apana.org.au> Subject: Re: [PATCH net-next] net_sched: more precise pkt_len computation On Fri, 2012-10-26 at 11:32 +0200, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@...gle.com> > > One long standing problem with TSO/GSO/GRO packets is that skb->len > doesnt represent a precise amount of bytes on wire. > > Headers are only accounted for the first segment. > For TCP, thats typically 66 bytes per 1448 bytes segment missing, > an error of 4.5 % > > As consequences : > > 1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits. > 2) Device stats are slightly under estimated as well. > > Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len > computation. > > Packet schedulers should use pkt_len instead of skb->len for their > bandwidth limitations, and TSO enabled devices drivers could use pkt_len > if their statistics are not hardware assisted, and if they dont scratch > skb->cb[] first word. > > Signed-off-by: Eric Dumazet <edumazet@...gle.com> > Cc: Jamal Hadi Salim <jhs@...atatu.com> > Cc: Stephen Hemminger <shemminger@...tta.com> > Cc: Paolo Valente <paolo.valente@...more.it> > Cc: Herbert Xu <herbert@...dor.apana.org.au> > Cc: Patrick McHardy <kaber@...sh.net> > --- > net/core/dev.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index b4978e2..cf2b5f4 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2442,7 +2442,6 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > bool contended; > int rc; > > - qdisc_skb_cb(skb)->pkt_len = skb->len; > qdisc_calculate_pkt_len(skb, q); > /* > * Heuristic to force contended enqueues to serialize on a > @@ -2579,6 +2578,16 @@ int dev_queue_xmit(struct sk_buff *skb) > skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS); > #endif > trace_net_dev_queue(skb); > + qdisc_skb_cb(skb)->pkt_len = skb->len; > + if (skb_is_gso(skb)) { > + unsigned int hdr_len = skb_transport_offset(skb); > + > + if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) > + hdr_len += tcp_hdrlen(skb); > + else > + hdr_len += sizeof(struct udphdr); > + qdisc_skb_cb(skb)->pkt_len += (skb_shinfo(skb)->gso_segs - 1) * hdr_len; > + } > if (q->enqueue) { > rc = __dev_xmit_skb(skb, q, dev, txq); > goto out; > Hmm, this doesnt quite work for GRO (ingress), as we call skb_reset_transport_header(skb); in __netif_receive_skb() before handle_ing() So skb_transport_offset(skb) is 14 here, instead of 14+(IP header) This skb_reset_transport_header(skb); looks like defensive programming, for the !NET_SKBUFF_DATA_USES_OFFSET case ? We could either : 1) remove this skb_reset_transport_header(skb) call or 2) use the following helper instead : #ifdef NET_SKBUFF_DATA_USES_OFFSET static inline void skb_sanitize_transport_header(struct sk_buff *skb) { skb->transport_header = max_t(sk_buff_data_t, skb->data - skb->head, skb->transport_header); } #else static inline void skb_sanitize_transport_header(struct sk_buff *skb) { skb->transport_header = max_t(sk_buff_data_t, skb->data, skb->transport_header); } #endif -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists