lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 11 Nov 2018 09:03:50 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Paweł Staszewski <pstaszewski@...are.pl>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        brouer@...hat.com
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal
 users traffic

On Sat, 10 Nov 2018 23:19:50 +0100
Paweł Staszewski <pstaszewski@...are.pl> wrote:

> W dniu 10.11.2018 o 23:06, Jesper Dangaard Brouer pisze:
> > On Sat, 10 Nov 2018 20:56:02 +0100
> > Paweł Staszewski <pstaszewski@...are.pl> wrote:
> >  
> >> W dniu 10.11.2018 o 20:49, Paweł Staszewski pisze:  
> >>>
> >>> W dniu 10.11.2018 o 20:34, Jesper Dangaard Brouer pisze:  
> >>>> On Fri, 9 Nov 2018 23:20:38 +0100 Paweł Staszewski
> >>>> <pstaszewski@...are.pl> wrote:
> >>>>     
> >>>>> W dniu 08.11.2018 o 20:12, Paweł Staszewski pisze:  

[...]
> > Do notice, the per CPU squeeze is not too large.  
>
> Yes - but im searching invisible thing now :) something invisible is 
> slowing down packet processing :)
> So trying to find any counter that have something to do with packet 
> processing.

NOTICE, I have given you the counters you need (below) 

> >
> > [...]  
> >>>>> Remember those tests are now on two separate connectx5 connected to
> >>>>> two separate pcie x16  gen 3.0  
> >>>>    That is strange... I still suspect some HW NIC issue, can you provide
> >>>> ethtool stats info via tool:
> >>>>
> >>>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
> >>>>
> >>>> $ ethtool_stats.pl --dev enp175s0 --dev enp216s0
> >>>>
> >>>> The tool remove zero-stats counters and report per sec stats. It makes
> >>>> it easier to spot that is relevant for the given workload.  
> >>> yes mlnx have just too many counters that are always 0 for my case :)
> >>> Will try this also
> >>>     
> >> But still alot of non 0 counters
> >> Show adapter(s) (enp175s0 enp216s0) statistics (ONLY that changed!)
> >> Ethtool(enp175s0) stat:         8891 (          8,891) <= ch0_arm /sec  
> > [...]
> >
> > I have copied the stats over in another document so I can better looks
> > at it... and I've found some interesting stats.
> >
> > E.g. we can see that the NIC hardware is dropping packets.
> >
> > RX-drops on enp175s0:
> >
> >   (enp175s0) stat: 4850734036 ( 4,850,734,036) <= rx_bytes /sec
> >   (enp175s0) stat: 5069043007 ( 5,069,043,007) <= rx_bytes_phy /sec
> >                    -218308971 (  -218,308,971) Dropped bytes /sec
> >   
> >   (enp175s0) stat: 139602 ( 139,602) <= rx_discards_phy /sec
> >
> >   (enp175s0) stat: 3717148 ( 3,717,148) <= rx_packets /sec
> >   (enp175s0) stat: 3862420 ( 3,862,420) <= rx_packets_phy /sec
> >                    -145272 (  -145,272) Dropped packets /sec
> >
> >
> > RX-drops on enp216s0 is less:
> >
> >   (enp216s0) stat: 2592286809 ( 2,592,286,809) <= rx_bytes /sec
> >   (enp216s0) stat: 2633575771 ( 2,633,575,771) <= rx_bytes_phy /sec
> >                     -41288962 (   -41,288,962) Dropped bytes /sec
> >
> >   (enp216s0) stat:   464 (464) <= rx_discards_phy /sec
> >
> >   (enp216s0) stat: 4971677 ( 4,971,677) <= rx_packets /sec
> >   (enp216s0) stat: 4975563 ( 4,975,563) <= rx_packets_phy /sec
> >                      -3886 (    -3,886) Dropped packets /sec
> >  
 
I would recommend, that you use ethtool stats and monitor rx_discards_phy.
The PHY are the counters from the hardware, and it shows that packets
are getting dropped at HW level.  This can be because software is not
fast enough to empty RX-queue, but in this case where CPUs are mostly
idle I don't think that is the case.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ