linux-kernel - Re: ip_queue_xmit(): Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -> 2.6.28

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200811182012.03386.nickpiggin@yahoo.com.au>
Date:	Tue, 18 Nov 2008 20:12:02 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Eric Dumazet <dada1@...mosbay.com>,
	David Miller <davem@...emloft.net>, rjw@...k.pl,
	linux-kernel@...r.kernel.org, kernel-testers@...r.kernel.org,
	cl@...ux-foundation.org, efault@....de, a.p.zijlstra@...llo.nl,
	Stephen Hemminger <shemminger@...tta.com>
Subject: Re: ip_queue_xmit(): Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28

On Tuesday 18 November 2008 07:32, Ingo Molnar wrote:
> * Ingo Molnar <mingo@...e.hu> wrote:
> > 100.000000 total
> > ................
> >   3.356152 ip_queue_xmit

> 30% of the overhead of this function comes from:
>
> ffffffff804b7203:        0 	66 c7 43 06 00 00    	movw   $0x0,0x6(%rbx)
> ffffffff804b7209:      118 	0f bf 85 40 02 00 00 	movswl 0x240(%rbp),%eax
> ffffffff804b7210:    10867 	48 8b 54 24 58       	mov    0x58(%rsp),%rdx
> ffffffff804b7215:      340 	85 c0                	test   %eax,%eax
> ffffffff804b7217:        0 	79 06                	jns    ffffffff804b721f
> <ip_queue_xmit+0x1da> ffffffff804b7219:   107464 	8b 82 9c 00 00 00    	mov
>    0x9c(%rdx),%eax ffffffff804b721f:     4963 	88 43 08             	mov   
> %al,0x8(%rbx)
>
> the 16-bit movw looks a bit weird. It comes from line 372:
>
>  0xffffffff804b7203 is in ip_queue_xmit (net/ipv4/ip_output.c:372).
>  367		iph = ip_hdr(skb);
>  368		*((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
>  369		if (ip_dont_fragment(sk, &rt->u.dst) && !ipfragok)
>  370			iph->frag_off = htons(IP_DF);
>  371		else
>  372			iph->frag_off = 0;
>  373		iph->ttl      = ip_select_ttl(inet, &rt->u.dst);
>  374		iph->protocol = sk->sk_protocol;
>  375		iph->saddr    = rt->rt_src;
>  376		iph->daddr    = rt->rt_dst;
>
> the ip-header fragment flag setting to zero.
>
> 16-bit ops are an on-off love/hate affair on x86 CPUs. The trend is
> towards eliminating them as much as possible.
>
> _But_, the real overhead probably comes from:
>
>  ffffffff804b7210:    10867 	48 8b 54 24 58       	mov    0x58(%rsp),%rdx
>
> which is the next line, the ttl field:
>
>  373             iph->ttl      = ip_select_ttl(inet, &rt->u.dst);
>
> this shows that we are doing a hard cachemiss on the net-localhost
> route dst structure cacheline. We do a plain load instruction from it
> here and get a hefty cachemiss. (because 16 CPUs are banging on that
> single route)

Why would that show up right there, though? Instruction like this should
be non-blocking. Shouldn't the cost should show up at some point where the
CPU executes an instruction depending on rdx? (and good luck working out
when that happens!)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/