[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220329104806.00000126@tom.com>
Date: Tue, 29 Mar 2022 10:48:06 +0800
From: Mingbao Sun <sunmingbao@....com>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>,
Chaitanya Kulkarni <kch@...dia.com>,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
David Ahern <dsahern@...nel.org>,
Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
tyler.sun@...l.com, ping.gan@...l.com, yanxiu.cai@...l.com,
libin.zhang@...l.com, ao.sun@...l.com
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the
congestion-control
> As I said, TCP can be tuned in various ways, congestion being just one
> of them. I'm sure you can find a workload where rmem/wmem will make
> a difference.
agree.
but the difference for the knob of rmem/wmem is:
we could enlarge rmem/wmem for NVMe/TCP via sysctl,
and it would not bring downside to any other sockets whose
rmem/wmem are not explicitly specified.
> In addition, based on my knowledge, application specific TCP level
> tuning (like congestion) is not really a common thing to do. So why in
> nvme-tcp?
>
> So to me at least, it is not clear why we should add it to the driver.
As mentioned in the commit message, though we can specify the
congestion-control of NVMe_over_TCP via sysctl or writing
'/proc/sys/net/ipv4/tcp_congestion_control', but this also
changes the congestion-control of all the future TCP sockets on
the same host that have not been explicitly assigned the
congestion-control, thus bringing potential impaction on their
performance.
For example:
A server in a data-center with the following 2 NICs:
- NIC_fron-end, for interacting with clients through WAN
(high latency, ms-level)
- NIC_back-end, for interacting with NVMe/TCP target through LAN
(low latency, ECN-enabled, ideal for dctcp)
This server interacts with clients (handling requests) via the fron-end
network and accesses the NVMe/TCP storage via the back-end network.
This is a normal use case, right?
For the client devices, we can’t determine their congestion-control.
But normally it’s cubic by default (per the CONFIG_DEFAULT_TCP_CONG).
So if we change the default congestion control on the server to dctcp
on behalf of the NVMe/TCP traffic of the LAN side, it could at the
same time change the congestion-control of the front-end sockets
to dctcp while the congestion-control of the client-side is cubic.
So this is an unexpected scenario.
In addition, distributed storage products like the following also have
the above problem:
- The product consists of a cluster of servers.
- Each server serves clients via its front-end NIC
(WAN, high latency).
- All servers interact with each other via NVMe/TCP via back-end NIC
(LAN, low latency, ECN-enabled, ideal for dctcp).
Powered by blists - more mailing lists