linux-kernel - Re: [patch V2 00/20] timer: Refactor the timer wheel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLoVRcO1quHnKLFhvt56Jwk3Rht+v0x7pKVoE=MXFJN-w@mail.gmail.com>
Date:	Fri, 17 Jun 2016 07:25:18 -0700
From:	Eric Dumazet <edumazet@...gle.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Chris Mason <clm@...com>,
	Arjan van de Ven <arjan@...radead.org>, rt@...utronix.de,
	Rik van Riel <riel@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	George Spelvin <linux@...encehorizons.net>,
	Len Brown <lenb@...nel.org>
Subject: Re: [patch V2 00/20] timer: Refactor the timer wheel

On Fri, Jun 17, 2016 at 6:57 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Fri, 17 Jun 2016, Eric Dumazet wrote:
>> >
>> >    To achieve this capacity with HZ=1000 without increasing the storage size
>> >    by another level, we reduced the granularity of the first wheel level from
>> >    1ms to 4ms. According to our data, there is no user which relies on that
>> >    1ms granularity and 99% of those timers are canceled before expiry.
>> >
>>
>> Ah... This might be a problem for people using small TCP RTO timers in
>> datacenters (order of 5 ms)
>> (and small delay ack timers as well, in the order of 4 ms)
>>
>> TCP/pacing uses high resolution timer in sch_fq.c so no problem there.
>>
>> If we arm a timer for 5 ms, what are the exact consequences ?
>
> The worst case expiry time is 8ms on HZ=1000 as it is on HZ=250
>
>> I fear we might trigger lot more of spurious retransmits.
>>
>> Or maybe I should read the patch series. I'll take some time today.
>
> Maybe just throw it at such a workload and see what happens :)

Well, when a network congestion happens in a cluster, and hundred of
millions of RTO timers fire,
adding fuel to the fire, it is a nightmare already ;)

To avoid increasing probability of such events we would need to have
at least 4 ms difference between the RTO timer and delack timer.

Meaning we have to increase both of them and increase P99 latencies of
RPC workloads.

Maybe a switch to hrtimer would be less risky.
But I do not know yet if it is doable without big performance penalty.