linux-kernel - Re: high number of dropped packets/rx_missed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2E1DF9B2-0CE3-4C4E-8803-0DC145BFE530@gmail.com>
Date:   Thu, 3 Dec 2020 21:43:54 +0200
From:   Andrei Popa <andreipopad@...il.com>
To:     "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc:     "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        linux-kernel@...r.kernel.org, peterz@...radead.org,
        Linux PM <linux-pm@...r.kernel.org>
Subject: Re: high number of dropped packets/rx_missed_errors from 4.17 kernel

Hi,

On what kernel version should I try the patch ? I tried on 5.9 and it doesn't build.

> On 18 Nov 2020, at 20:47, Rafael J. Wysocki <rjw@...ysocki.net> wrote:
> 
> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote:
>> On 11/16/2020 8:11 AM, Andrei Popa wrote:
>>> Hello,
>>> 
>>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic we experience, on a  number of servers, a very high number of rx_missed_errors and dropped packets only on the uplink 10G interface. We have another 10G downlink interface with no problems.
>>> 
>>> The affected servers have the following mainboards:
>>> S5520HC ver E26045-455
>>> S5520UR ver E22554-751
>>> S5520UR ver E22554-753
>>> S5000VSA
>>> 
>>> On other 30 servers with similar mainboards and/or configs there are no dropped packets with vmlinuz-5.4.0-37-generic.
>>> 
>>> We’ve installed vanilla 4.16 and there were no dropped packets.
>>> Vanilla 4.17 had a very high number of dropped packets like the following:
>>> 
>>> root@...per:~# cat test
>>> #!/bin/bash
>>> while true
>>> do
>>> ethtool -S ens6f1|grep "missed_errors"
>>> ifconfig ens6f1|grep RX|grep dropped
>>> sleep 1
>>> done
>>> 
>>> root@...per:~# ./test
>>>      rx_missed_errors: 2418845
>>>         RX errors 0  dropped 2418888  overruns 0  frame 0
>>>      rx_missed_errors: 2426175
>>>         RX errors 0  dropped 2426218  overruns 0  frame 0
>>>      rx_missed_errors: 2431910
>>>         RX errors 0  dropped 2431953  overruns 0  frame 0
>>>      rx_missed_errors: 2437266
>>>         RX errors 0  dropped 2437309  overruns 0  frame 0
>>>      rx_missed_errors: 2443305
>>>         RX errors 0  dropped 2443348  overruns 0  frame 0
>>>      rx_missed_errors: 2448357
>>>         RX errors 0  dropped 2448400  overruns 0  frame 0
>>>      rx_missed_errors: 2452539
>>>         RX errors 0  dropped 2452582  overruns 0  frame 0
>>> 
>>> We did a git bisect and we’ve found that the following commit generates the high number of dropped packets:
>>> 
>>> Author: Rafael J. Wysocki <rafael.j.wysocki@...el.com <mailto:rafael.j.wysocki@...el.com>>
>>> Date:   Thu Apr 5 19:12:43 2018 +0200
>>>     cpuidle: menu: Avoid selecting shallow states with stopped tick
>>>     If the scheduler tick has been stopped already and the governor
>>>     selects a shallow idle state, the CPU can spend a long time in that
>>>     state if the selection is based on an inaccurate prediction of idle
>>>     time.  That effect turns out to be relevant, so it needs to be
>>>     mitigated.
>>>     To that end, modify the menu governor to discard the result of the
>>>     idle time prediction if the tick is stopped and the predicted idle
>>>     time is less than the tick period length, unless the tick timer is
>>>     going to expire soon.
>>>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com <mailto:rafael.j.wysocki@...el.com>>
>>>     Acked-by: Peter Zijlstra (Intel) <peterz@...radead.org <mailto:peterz@...radead.org>>
>>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>>> index 267982e471e0..1bfe03ceb236 100644
>>> --- a/drivers/cpuidle/governors/menu.c
>>> +++ b/drivers/cpuidle/governors/menu.c
>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>>>          */
>>>         data->predicted_us = min(data->predicted_us, expected_interval);
>>> -       /*
>>> -        * Use the performance multiplier and the user-configurable
>>> -        * latency_req to determine the maximum exit latency.
>>> -        */
>>> -       interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
>>> -       if (latency_req > interactivity_req)
>>> -               latency_req = interactivity_req;
>> 
>> The tick_nohz_tick_stopped() check may be done after the above and it 
>> may be reworked a bit.
>> 
>> I'll send a test patch to you shortly.
> 
> The patch is appended, but please note that it has been rebased by hand and
> not tested.
> 
> Please let me know if it makes any difference.
> 
> And in the future please avoid pasting the entire kernel config to your
> reports, that's problematic.
> 
> ---
> drivers/cpuidle/governors/menu.c |   23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -308,18 +308,18 @@ static int menu_select(struct cpuidle_dr
> 				get_typical_interval(data, predicted_us)) *
> 				NSEC_PER_USEC;
> 
> -	if (tick_nohz_tick_stopped()) {
> -		/*
> -		 * If the tick is already stopped, the cost of possible short
> -		 * idle duration misprediction is much higher, because the CPU
> -		 * may be stuck in a shallow idle state for a long time as a
> -		 * result of it.  In that case say we might mispredict and use
> -		 * the known time till the closest timer event for the idle
> -		 * state selection.
> -		 */
> -		if (data->predicted_us < TICK_USEC)
> -			data->predicted_us = min_t(unsigned int, TICK_USEC,
> -						   ktime_to_us(delta_next));
> +	/*
> +	 * If the tick is already stopped, the cost of possible short idle
> +	 * duration misprediction is much higher, because the CPU may be stuck
> +	 * in a shallow idle state for a long time as a result of it.  In that
> +	 * case, say we might mispredict and use the known time till the closest
> +	 * timer event for the idle state selection, unless that event is going
> +	 * to occur within the tick time frame (in which case the CPU will be
> +	 * woken up from whatever idle state it gets into soon enough anyway).
> +	 */
> +	if (tick_nohz_tick_stopped() && data->predicted_us < TICK_USEC &&
> +	    delta_next >= TICK_NSEC) {
> +		data->predicted_us = ktime_to_us(delta_next);
> 	} else {
> 		/*
> 		 * Use the performance multiplier and the user-configurable