[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4dc65899-e599-43e3-8f95-585d3489b424@uwaterloo.ca>
Date: Sun, 18 Aug 2024 10:51:04 -0400
From: Martin Karsten <mkarsten@...terloo.ca>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Joe Damato <jdamato@...tly.com>
Cc: Samiullah Khawaja <skhawaja@...gle.com>,
Stanislav Fomichev <sdf@...ichev.me>, netdev@...r.kernel.org,
amritha.nambiar@...el.com, sridhar.samudrala@...el.com,
Alexander Lobakin <aleksander.lobakin@...el.com>,
Alexander Viro <viro@...iv.linux.org.uk>, Breno Leitao <leitao@...ian.org>,
Christian Brauner <brauner@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Jan Kara <jack@...e.cz>,
Jiri Pirko <jiri@...nulli.us>, Johannes Berg <johannes.berg@...el.com>,
Jonathan Corbet <corbet@....net>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
"open list:FILESYSTEMS (VFS and infrastructure)"
<linux-fsdevel@...r.kernel.org>, open list <linux-kernel@...r.kernel.org>,
Lorenzo Bianconi <lorenzo@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll
On 2024-08-18 08:55, Willem de Bruijn wrote:
>>>>>> The value may not be obvious, but guidance (in the form of
>>>>>> documentation) can be provided.
>>>>>
>>>>> Okay. Could you share a stab at what that would look like?
>>>>
>>>> The timeout needs to be large enough that an application can get a
>>>> meaningful number of incoming requests processed without softirq
>>>> interference. At the same time, the timeout value determines the
>>>> worst-case delivery delay that a concurrent application using the same
>>>> queue(s) might experience. Please also see my response to Samiullah
>>>> quoted above. The specific circumstances and trade-offs might vary,
>>>> that's why a simple constant likely won't do.
>>>
>>> Thanks. I really do mean this as an exercise of what documentation in
>>> Documentation/networking/napi.rst will look like. That helps makes the
>>> case that the interface is reasonably ease to use (even if only
>>> targeting advanced users).
>>>
>>> How does a user measure how much time a process will spend on
>>> processing a meaningful number of incoming requests, for instance.
>>> In practice, probably just a hunch?
>>
>> As an example, we measure around 1M QPS in our experiments, fully
>> utilizing 8 cores and knowing that memcached is quite scalable. Thus we
>> can conclude a single request takes about 8 us processing time on
>> average. That has led us to a 20 us small timeout (gro_flush_timeout),
>> enough to make sure that a single request is likely not interfered with,
>> but otherwise as small as possible. If multiple requests arrive, the
>> system will quickly switch back to polling mode.
>>
>> At the other end, we have picked a very large irq_suspend_timeout of
>> 20,000 us to demonstrate that it does not negatively impact latency.
>> This would cover 2,500 requests, which is likely excessive, but was
>> chosen for demonstration purposes. One can easily measure the
>> distribution of epoll_wait batch sizes and batch sizes as low as 64 are
>> already very efficient, even in high-load situations.
>
> Overall Ack on both your and Joe's responses.
>
> epoll_wait disables the suspend if no events are found and ep_poll
> would go to sleep. As the paper also hints, the timeout is only there
> for misbehaving applications that stop calling epoll_wait, correct?
> If so, then picking a value is not that critical, as long as not too
> low to do meaningful work.
Correct.
>> Also see next paragraph.
>>
>>> Playing devil's advocate some more: given that ethtool usecs have to
>>> be chosen with a similar trade-off between latency and efficiency,
>>> could a multiplicative factor of this (or gro_flush_timeout, same
>>> thing) be sufficient and easier to choose? The documentation does
>>> state that the value chosen must be >= gro_flush_timeout.
>>
>> I believe this would take away flexibility without gaining much. You'd
>> still want some sort of admin-controlled 'enable' flag, so you'd still
>> need some kind of parameter.
>>
>> When using our scheme, the factor between gro_flush_timeout and
>> irq_suspend_timeout should *roughly* correspond to the maximum batch
>> size that an application would process in one go (orders of magnitude,
>> see above). This determines both the target application's worst-case
>> latency as well as the worst-case latency of concurrent applications, if
>> any, as mentioned previously.
>
> Oh is concurrent applications the argument against a very high
> timeout?
Only in the error case. If suspend_irq_timeout is large enough as you
point out above, then as long as the target application behaves well,
its batching settings are the determining factor.
Thanks,
Martin
Powered by blists - more mailing lists