[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16fa03d5-c110-75d6-9181-d239578db0a2@gmail.com>
Date: Thu, 13 Jul 2023 07:59:32 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Anuj Gupta <anuj20.g@...sung.com>, davem@...emloft.net
Cc: holger@...lied-asynchrony.com, kai.heng.feng@...onical.com,
simon.horman@...igine.com, nic_swsd@...ltek.com, netdev@...r.kernel.org,
linux-nvme@...ts.infradead.org
Subject: Re: Performance Regression due to ASPM disable patch
On 12.07.2023 17:55, Anuj Gupta wrote:
> Hi,
>
> I see a performance regression for read/write workloads on our NVMe over
> fabrics using TCP as transport setup.
> IOPS drop by 23% for 4k-randread [1] and by 18% for 4k-randwrite [2].
>
> I bisected and found that the commit
> e1ed3e4d91112027b90c7ee61479141b3f948e6a ("r8169: disable ASPM during
> NAPI poll") is the trigger.
> When I revert this commit, the performance drop goes away.
>
> The target machine uses a realtek ethernet controller -
> root@...tpc:/home/test# lspci | grep -i eth
> 29:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 2600
> (rev 21)
> 2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Killer
> E3000 2.5GbE Controller (rev 03)
>
> I tried to disable aspm by passing "pcie_aspm=off" as boot parameter and
> by setting pcie aspm policy to performance. But it didn't improve the
> performance.
> I wonder if this is already known, and something different should be
> done to handle the original issue?
>
> [1] fio randread
> fio -direct=1 -iodepth=1 -rw=randread -ioengine=psync -bs=4k -numjobs=1
> -runtime=30 -group_reporting -filename=/dev/nvme1n1 -name=psync_read
> -output=psync_read
> [2] fio randwrite
> fio -direct=1 -iodepth=1 -rw=randwrite -ioengine=psync -bs=4k -numjobs=1
> -runtime=30 -group_reporting -filename=/dev/nvme1n1 -name=psync_read
> -output=psync_write
>
>
I can imagine a certain performance impact of this commit if there are
lots of small packets handled by individual NAPI polls.
Maybe it's also chip version specific.
You have two NIC's, do you see the issue with both of them?
Related: What's your line speed, 1Gbps or 2.5Gbps?
Can you reproduce the performance impact with iperf?
Do you use any network optimization settings for latency vs. performance?
Interrupt coalescing, is TSO(6) enabled?
An ethtool -k output may provide further insight.
Powered by blists - more mailing lists