lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 May 2019 15:01:01 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Rafał Miłecki <zajec5@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-block@...r.kernel.org, John Crispin <john@...ozen.org>,
        Jonas Gorski <jonas.gorski@...il.com>,
        Jo-Philipp Wich <jo@...n.io>, Felix Fietkau <nbd@....name>
Subject: Re: ARM router NAT performance affected by random/unrelated commits

> I also tried running cachestat but didn't get anything interesting:
> Counting cache functions... Output every 1 seconds.
> TIME         HITS   MISSES  DIRTIES    RATIO   BUFFERS_MB   CACHE_MB
> 10:06:59     1020        5        0    99.5%            0          2
> 10:07:00     1029        0        0   100.0%            0          2
> 10:07:01     1013        0        0   100.0%            0          2
> 10:07:02     1029        0        0   100.0%            0          2
> 10:07:03     1029        0        0   100.0%            0          2
> 10:07:04      997        0        0   100.0%            0          2
> 10:07:05     1013        0        0   100.0%            0          2
> (I started iperf at 10:07:00).

Try looking at the L1 cache performance. For this class of device, the
L1 code cache is probably too small to contain the active parts of the
network stack. The less cache thrashing you have, the faster the stack
will go.

Maybe try compiling with -Os so it optimises for size.

Build a custom kernel with everything you don't need turned off.

Look at the work being done to batch process packets. Rather than
passing one packet at a time through the network stack, it passes a
linked list of packets to each stage in the stack. That should result
in less cache misses per packet. But not all layers in the stack
support this batching. See if you can find out where it is being
unbatched, and why. Can you influence this, disable build options, or
work on the code to pass batches further along the stack.

     Andrew

Powered by blists - more mailing lists