netdev - Re: tcp_shift_skb_data uses wrong mss in non-gso case?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 24 Aug 2010 11:59:07 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Jesse Brandeburg <jesse.brandeburg@...il.com>
cc:	Robert Evans <robert.evans@...daqomx.com>,
	Herbert Xu <herbert@...dor.hengli.com.au>,
	Eric Dumazet <eric.dumazet@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: tcp_shift_skb_data uses wrong mss in non-gso case?

On Mon, 23 Aug 2010, Jesse Brandeburg wrote:

> On Tue, Aug 17, 2010 at 3:35 PM, Robert Evans
> <robert.evans@...daqomx.com> wrote:
> > Hello all,
> >
> > In reading through the latest SACK code introduced by 832d11c, I have noticed
> > that for the in_sack case tcp_shift_skb will take mss = tcp_skb_seglen(skb).
> > This seems to be wrong since the queue might contain small packets (f.e.
> > TCP_NODELAY). If the collapse succeeds, the resulting skb will have an
> > arbitrarily small gso_size equal to the original skb length.
> 
> yeah, gso_size really should never be == skb->length, because then it
> implies you're offloading a frame to be segmented with no segmentable
> data.

What makes you think we'd get such segments? Point here is verify that the 
segments to be combined into a single skb share segment size, small or 
large doesn't make any difference in that respect as both were counted as 
a packet (or pcount worth of packets) w.r.t. cwnd consumption.

> > 8ed88d4:net/ipv4/tcp_input.c
> >   1506         in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) &&
> >   1507                   !before(end_seq, TCP_SKB_CB(skb)->end_seq);
> >   1508
> >   1509         if (in_sack) {
> >   1510                 len = skb->len;
> >   1511                 pcount = tcp_skb_pcount(skb);
> >   1512                 mss = tcp_skb_seglen(skb);     /* possibly wrong? */
> >   1513
> >   1514                 /* TODO: Fix DSACKs to not fragment already SACKed and w
> >   1514 e can
> >   1515                  * drop this restriction as unnecessary
> >   1516                  */
> >   1517                 if (mss != tcp_skb_seglen(prev))
> >   1518                         goto fallback;
> >   1519         } else {
> >
> > This ends up being troublesome if the segment is later retransmitted and the
> > device driver has trouble with very small gso_size (e1000e seems to be an
> > example).
> 
> I bet lots of other drivers will have issue with this too.
>
> > Is the small gso_size the correct and/or desired behavior?  Or am I missing
> > something else that prevents this from being a problem?
> 
> I believe that this is invalid for the stack to do, Ilpo, Herbert?
> what do you think?

The point of this code is to work with already SACKed segments (or to 
become SACKed from the current ACK), only TCP code is interested in them 
anymore. They should not be sent again to wire unless SACK reneging occurs 
but then retransmission code should deal with this just fine:

        if (skb->len > cur_mss) {
                if (tcp_fragment(sk, skb, cur_mss, cur_mss))
                        return -ENOMEM; /* We'll try again later. */
        } else {
                int oldpcount = tcp_skb_pcount(skb);

                if (unlikely(oldpcount > 1)) {
                        tcp_init_tso_segs(sk, skb, cur_mss);
                        tcp_adjust_pcount(sk, skb, oldpcount - tcp_skb_pcount(skb));
                }
        }


-- 
 i.