lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 29 Mar 2022 10:46:08 +0300
From:   Sagi Grimberg <>
To:     Mingbao Sun <>
Cc:     Keith Busch <>, Jens Axboe <>,
        Christoph Hellwig <>,
        Chaitanya Kulkarni <>,,,
        Eric Dumazet <>,
        "David S . Miller" <>,
        Hideaki YOSHIFUJI <>,
        David Ahern <>,
        Jakub Kicinski <>,,,,,,
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the

>> As I said, TCP can be tuned in various ways, congestion being just one
>> of them. I'm sure you can find a workload where rmem/wmem will make
>> a difference.
> agree.
> but the difference for the knob of rmem/wmem is:
> we could enlarge rmem/wmem for NVMe/TCP via sysctl,
> and it would not bring downside to any other sockets whose
> rmem/wmem are not explicitly specified.

It can most certainly affect them, positively or negatively, depends
on the use-case.

>> In addition, based on my knowledge, application specific TCP level
>> tuning (like congestion) is not really a common thing to do. So why in
>> nvme-tcp?
>> So to me at least, it is not clear why we should add it to the driver.
> As mentioned in the commit message, though we can specify the
> congestion-control of NVMe_over_TCP via sysctl or writing
> '/proc/sys/net/ipv4/tcp_congestion_control', but this also
> changes the congestion-control of all the future TCP sockets on
> the same host that have not been explicitly assigned the
> congestion-control, thus bringing potential impaction on their
> performance.
> For example:
> A server in a data-center with the following 2 NICs:
>      - NIC_fron-end, for interacting with clients through WAN
>        (high latency, ms-level)
>      - NIC_back-end, for interacting with NVMe/TCP target through LAN
>        (low latency, ECN-enabled, ideal for dctcp)
> This server interacts with clients (handling requests) via the fron-end
> network and accesses the NVMe/TCP storage via the back-end network.
> This is a normal use case, right?
> For the client devices, we can’t determine their congestion-control.
> But normally it’s cubic by default (per the CONFIG_DEFAULT_TCP_CONG).
> So if we change the default congestion control on the server to dctcp
> on behalf of the NVMe/TCP traffic of the LAN side, it could at the
> same time change the congestion-control of the front-end sockets
> to dctcp while the congestion-control of the client-side is cubic.
> So this is an unexpected scenario.
> In addition, distributed storage products like the following also have
> the above problem:
>      - The product consists of a cluster of servers.
>      - Each server serves clients via its front-end NIC
>       (WAN, high latency).
>      - All servers interact with each other via NVMe/TCP via back-end NIC
>       (LAN, low latency, ECN-enabled, ideal for dctcp).

Separate networks are still not application (nvme-tcp) specific and as
mentioned, we have a way to control that. IMO, this still does not
qualify as solid justification to add this to nvme-tcp.

What do others think?

Powered by blists - more mailing lists