netdev - Re: ARM router NAT performance affected by random/unrelated commits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190521130101.GC6577@lunn.ch>
Date:   Tue, 21 May 2019 15:01:01 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Rafał Miłecki <zajec5@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-block@...r.kernel.org, John Crispin <john@...ozen.org>,
        Jonas Gorski <jonas.gorski@...il.com>,
        Jo-Philipp Wich <jo@...n.io>, Felix Fietkau <nbd@....name>
Subject: Re: ARM router NAT performance affected by random/unrelated commits

> I also tried running cachestat but didn't get anything interesting:
> Counting cache functions... Output every 1 seconds.
> TIME         HITS   MISSES  DIRTIES    RATIO   BUFFERS_MB   CACHE_MB
> 10:06:59     1020        5        0    99.5%            0          2
> 10:07:00     1029        0        0   100.0%            0          2
> 10:07:01     1013        0        0   100.0%            0          2
> 10:07:02     1029        0        0   100.0%            0          2
> 10:07:03     1029        0        0   100.0%            0          2
> 10:07:04      997        0        0   100.0%            0          2
> 10:07:05     1013        0        0   100.0%            0          2
> (I started iperf at 10:07:00).

Try looking at the L1 cache performance. For this class of device, the
L1 code cache is probably too small to contain the active parts of the
network stack. The less cache thrashing you have, the faster the stack
will go.

Maybe try compiling with -Os so it optimises for size.

Build a custom kernel with everything you don't need turned off.

Look at the work being done to batch process packets. Rather than
passing one packet at a time through the network stack, it passes a
linked list of packets to each stage in the stack. That should result
in less cache misses per packet. But not all layers in the stack
support this batching. See if you can find out where it is being
unbatched, and why. Can you influence this, disable build options, or
work on the code to pass batches further along the stack.

     Andrew