[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLoVRcO1quHnKLFhvt56Jwk3Rht+v0x7pKVoE=MXFJN-w@mail.gmail.com>
Date: Fri, 17 Jun 2016 07:25:18 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Chris Mason <clm@...com>,
Arjan van de Ven <arjan@...radead.org>, rt@...utronix.de,
Rik van Riel <riel@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
George Spelvin <linux@...encehorizons.net>,
Len Brown <lenb@...nel.org>
Subject: Re: [patch V2 00/20] timer: Refactor the timer wheel
On Fri, Jun 17, 2016 at 6:57 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Fri, 17 Jun 2016, Eric Dumazet wrote:
>> >
>> > To achieve this capacity with HZ=1000 without increasing the storage size
>> > by another level, we reduced the granularity of the first wheel level from
>> > 1ms to 4ms. According to our data, there is no user which relies on that
>> > 1ms granularity and 99% of those timers are canceled before expiry.
>> >
>>
>> Ah... This might be a problem for people using small TCP RTO timers in
>> datacenters (order of 5 ms)
>> (and small delay ack timers as well, in the order of 4 ms)
>>
>> TCP/pacing uses high resolution timer in sch_fq.c so no problem there.
>>
>> If we arm a timer for 5 ms, what are the exact consequences ?
>
> The worst case expiry time is 8ms on HZ=1000 as it is on HZ=250
>
>> I fear we might trigger lot more of spurious retransmits.
>>
>> Or maybe I should read the patch series. I'll take some time today.
>
> Maybe just throw it at such a workload and see what happens :)
Well, when a network congestion happens in a cluster, and hundred of
millions of RTO timers fire,
adding fuel to the fire, it is a nightmare already ;)
To avoid increasing probability of such events we would need to have
at least 4 ms difference between the RTO timer and delack timer.
Meaning we have to increase both of them and increase P99 latencies of
RPC workloads.
Maybe a switch to hrtimer would be less risky.
But I do not know yet if it is doable without big performance penalty.
Powered by blists - more mailing lists