[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <218fd4946208411b90ac77cfcf7aa643@AcuMS.aculab.com>
Date: Tue, 8 Mar 2022 22:12:06 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Eric Dumazet' <edumazet@...gle.com>
CC: Jakub Kicinski <kuba@...nel.org>, netdev <netdev@...r.kernel.org>,
"Willem de Bruijn" <willemb@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>,
"Yuchung Cheng" <ycheng@...gle.com>
Subject: RE: [RFC net-next] tcp: allow larger TSO to be built under overload
From: Eric Dumazet
> Sent: 08 March 2022 19:54
..
> > Which is the common side of that max_t() ?
> > If it is mon_tso_segs it might be worth avoiding the
> > divide by coding as:
> >
> > return bytes > mss_now * min_tso_segs ? bytes / mss_now : min_tso_segs;
> >
>
> I think the common case is when the divide must happen.
> Not sure if this really matters with current cpus.
Last document I looked at still quoted considerable latency
for integer divide on x86-64.
If you get a cmov then all the instructions will just get
queued waiting for the divide to complete.
But a branch could easily get mispredicted.
That is likely to hit ppc - which I don't think has a cmov?
OTOH if the divide is in the ?: bit nothing probably depends
on it for a while - so the latency won't matter.
Latest figures I have are for skylakeX
u-ops latency 1/throughput
DIV r8 10 10 p0 p1 p5 p6 23 6
DIV r16 10 10 p0 p1 p5 p6 23 6
DIV r32 10 10 p0 p1 p5 p6 26 6
DIV r64 36 36 p0 p1 p5 p6 35-88 21-83
IDIV r8 11 11 p0 p1 p5 p6 24 6
IDIV r16 10 10 p0 p1 p5 p6 23 6
IDIV r32 10 10 p0 p1 p5 p6 26 6
IDIV r64 57 57 p0 p1 p5 p6 42-95 24-90
Broadwell is a bit slower.
Note that 64bit divide is really horrid.
I think that one will be 32bit - so 'only' 26 clocks
latency.
AMD Ryzen is a lot better for 64bit divides:
ltncy 1/thpt
DIV r8/m8 1 13-16 13-16
DIV r16/m16 2 14-21 14-21
DIV r32/m32 2 14-30 14-30
DIV r64/m64 2 14-46 14-45
IDIV r8/m8 1 13-16 13-16
IDIV r16/m16 2 13-21 14-22
IDIV r32/m32 2 14-30 14-30
IDIV r64/m64 2 14-47 14-45
But less pipelining for 32bit ones.
Quite how those tables actually affect real code
is another matter - but they are guidelines about
what is possible (if you can get the u-ops executed
on the right ports).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists