[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2E1DF9B2-0CE3-4C4E-8803-0DC145BFE530@gmail.com>
Date: Thu, 3 Dec 2020 21:43:54 +0200
From: Andrei Popa <andreipopad@...il.com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
linux-kernel@...r.kernel.org, peterz@...radead.org,
Linux PM <linux-pm@...r.kernel.org>
Subject: Re: high number of dropped packets/rx_missed_errors from 4.17 kernel
Hi,
On what kernel version should I try the patch ? I tried on 5.9 and it doesn't build.
> On 18 Nov 2020, at 20:47, Rafael J. Wysocki <rjw@...ysocki.net> wrote:
>
> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote:
>> On 11/16/2020 8:11 AM, Andrei Popa wrote:
>>> Hello,
>>>
>>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic we experience, on a number of servers, a very high number of rx_missed_errors and dropped packets only on the uplink 10G interface. We have another 10G downlink interface with no problems.
>>>
>>> The affected servers have the following mainboards:
>>> S5520HC ver E26045-455
>>> S5520UR ver E22554-751
>>> S5520UR ver E22554-753
>>> S5000VSA
>>>
>>> On other 30 servers with similar mainboards and/or configs there are no dropped packets with vmlinuz-5.4.0-37-generic.
>>>
>>> We’ve installed vanilla 4.16 and there were no dropped packets.
>>> Vanilla 4.17 had a very high number of dropped packets like the following:
>>>
>>> root@...per:~# cat test
>>> #!/bin/bash
>>> while true
>>> do
>>> ethtool -S ens6f1|grep "missed_errors"
>>> ifconfig ens6f1|grep RX|grep dropped
>>> sleep 1
>>> done
>>>
>>> root@...per:~# ./test
>>> rx_missed_errors: 2418845
>>> RX errors 0 dropped 2418888 overruns 0 frame 0
>>> rx_missed_errors: 2426175
>>> RX errors 0 dropped 2426218 overruns 0 frame 0
>>> rx_missed_errors: 2431910
>>> RX errors 0 dropped 2431953 overruns 0 frame 0
>>> rx_missed_errors: 2437266
>>> RX errors 0 dropped 2437309 overruns 0 frame 0
>>> rx_missed_errors: 2443305
>>> RX errors 0 dropped 2443348 overruns 0 frame 0
>>> rx_missed_errors: 2448357
>>> RX errors 0 dropped 2448400 overruns 0 frame 0
>>> rx_missed_errors: 2452539
>>> RX errors 0 dropped 2452582 overruns 0 frame 0
>>>
>>> We did a git bisect and we’ve found that the following commit generates the high number of dropped packets:
>>>
>>> Author: Rafael J. Wysocki <rafael.j.wysocki@...el.com <mailto:rafael.j.wysocki@...el.com>>
>>> Date: Thu Apr 5 19:12:43 2018 +0200
>>> cpuidle: menu: Avoid selecting shallow states with stopped tick
>>> If the scheduler tick has been stopped already and the governor
>>> selects a shallow idle state, the CPU can spend a long time in that
>>> state if the selection is based on an inaccurate prediction of idle
>>> time. That effect turns out to be relevant, so it needs to be
>>> mitigated.
>>> To that end, modify the menu governor to discard the result of the
>>> idle time prediction if the tick is stopped and the predicted idle
>>> time is less than the tick period length, unless the tick timer is
>>> going to expire soon.
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com <mailto:rafael.j.wysocki@...el.com>>
>>> Acked-by: Peter Zijlstra (Intel) <peterz@...radead.org <mailto:peterz@...radead.org>>
>>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>>> index 267982e471e0..1bfe03ceb236 100644
>>> --- a/drivers/cpuidle/governors/menu.c
>>> +++ b/drivers/cpuidle/governors/menu.c
>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>>> */
>>> data->predicted_us = min(data->predicted_us, expected_interval);
>>> - /*
>>> - * Use the performance multiplier and the user-configurable
>>> - * latency_req to determine the maximum exit latency.
>>> - */
>>> - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
>>> - if (latency_req > interactivity_req)
>>> - latency_req = interactivity_req;
>>
>> The tick_nohz_tick_stopped() check may be done after the above and it
>> may be reworked a bit.
>>
>> I'll send a test patch to you shortly.
>
> The patch is appended, but please note that it has been rebased by hand and
> not tested.
>
> Please let me know if it makes any difference.
>
> And in the future please avoid pasting the entire kernel config to your
> reports, that's problematic.
>
> ---
> drivers/cpuidle/governors/menu.c | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -308,18 +308,18 @@ static int menu_select(struct cpuidle_dr
> get_typical_interval(data, predicted_us)) *
> NSEC_PER_USEC;
>
> - if (tick_nohz_tick_stopped()) {
> - /*
> - * If the tick is already stopped, the cost of possible short
> - * idle duration misprediction is much higher, because the CPU
> - * may be stuck in a shallow idle state for a long time as a
> - * result of it. In that case say we might mispredict and use
> - * the known time till the closest timer event for the idle
> - * state selection.
> - */
> - if (data->predicted_us < TICK_USEC)
> - data->predicted_us = min_t(unsigned int, TICK_USEC,
> - ktime_to_us(delta_next));
> + /*
> + * If the tick is already stopped, the cost of possible short idle
> + * duration misprediction is much higher, because the CPU may be stuck
> + * in a shallow idle state for a long time as a result of it. In that
> + * case, say we might mispredict and use the known time till the closest
> + * timer event for the idle state selection, unless that event is going
> + * to occur within the tick time frame (in which case the CPU will be
> + * woken up from whatever idle state it gets into soon enough anyway).
> + */
> + if (tick_nohz_tick_stopped() && data->predicted_us < TICK_USEC &&
> + delta_next >= TICK_NSEC) {
> + data->predicted_us = ktime_to_us(delta_next);
> } else {
> /*
> * Use the performance multiplier and the user-configurable
Powered by blists - more mailing lists