netdev - Re: [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5227FDFC.5090007@redhat.com>
Date:	Thu, 05 Sep 2013 11:43:56 +0800
From:	Jason Wang <jasowang@...hat.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Yuchung Cheng <ycheng@...gle.com>,
	Neal Cardwell <ncardwell@...gle.com>,
	"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler

On 09/05/2013 08:50 AM, Eric Dumazet wrote:
> On Wed, 2013-09-04 at 04:27 -0700, Eric Dumazet wrote:
>> On Wed, 2013-09-04 at 03:30 -0700, Eric Dumazet wrote:
>>> On Wed, 2013-09-04 at 14:30 +0800, Jason Wang wrote:
>>>
>>>>> And tcpdump would certainly help ;)
>>>> See attachment.
>>>>
>>> Nothing obvious on tcpdump (only that lot of frames are missing)
>>>
>>> 1) Are you capturing part of the payload only (like tcpdump -s 128)
>>>
>>> 2) What is the setup.
>>>
>>> 3) tc -s -d qdisc
>> If you use FQ in the guest, then it could be that high resolution timers
>> have high latency ?
>>
>> So FQ arms short timers, but effective duration could be much longer.
>>
>> Here I get a smooth latency of up to ~3 us
>>
>> lpq83:~# ./netperf -H lpq84 ; ./tc -s -d qd ; dmesg | tail -n1
>> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
>> Recv   Send    Send                          
>> Socket Socket  Message  Elapsed              
>> Size   Size    Size     Time     Throughput  
>> bytes  bytes   bytes    secs.    10^6bits/sec  
>>
>>  87380  16384  16384    10.00    9410.82   
>> qdisc fq 8005: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 
>>  Sent 50545633991 bytes 33385894 pkt (dropped 0, overlimits 0 requeues 19) 
>>  rate 9258Mbit 764335pps backlog 0b 0p requeues 19 
>>   117 flow, 115 inactive, 0 throttled
>>   0 gc, 0 highprio, 0 retrans, 96861 throttled, 0 flows_plimit
>> [  572.551664] latency = 3035 ns
>>
>>
>> What do you get with this debugging patch ?
>>
>> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
>> index 32ad015..c1312a0 100644
>> --- a/net/sched/sch_fq.c
>> +++ b/net/sched/sch_fq.c
>> @@ -103,6 +103,7 @@ struct fq_sched_data {
>>  	u64		stat_internal_packets;
>>  	u64		stat_tcp_retrans;
>>  	u64		stat_throttled;
>> +	s64		slatency;
>>  	u64		stat_flows_plimit;
>>  	u64		stat_pkts_too_long;
>>  	u64		stat_allocation_errors;
>> @@ -393,6 +394,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>>  static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>>  {
>>  	struct rb_node *p;
>> +	bool first = true;
>>  
>>  	if (q->time_next_delayed_flow > now)
>>  		return;
>> @@ -405,6 +407,13 @@ static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>>  			q->time_next_delayed_flow = f->time_next_packet;
>>  			break;
>>  		}
>> +		if (first) {
>> +			s64 delay = now - f->time_next_packet;
>> +
>> +			first = false;
>> +			delay -= q->slatency >> 3;
>> +			q->slatency += delay;
>> +		}
>>  		rb_erase(p, &q->delayed);
>>  		q->throttled_flows--;
>>  		fq_flow_add_tail(&q->old_flows, f);
>> @@ -711,6 +720,7 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
>>  	if (opts == NULL)
>>  		goto nla_put_failure;
>>  
>> +	pr_err("latency = %lld ns\n", q->slatency >> 3);
>>  	if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
>>  	    nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
>>  	    nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) ||
>>
>
> BTW what is your HZ value ?

Guest HZ is 1000.
>
> We have a problem in TCP stack, because srtt is in HZ units.
>
> Before we change to us units, I guess tcp_update_pacing_rate() should be
> changed a bit if HZ=250
>
>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html