linux-kernel - Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries configurable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <08deb91f7516b4c8911211499a00a58392d65a8b.camel@redhat.com>
Date:   Thu, 13 Jun 2019 12:32:24 -0400
From:   Doug Ledford <dledford@...hat.com>
To:     Bart Van Assche <bvanassche@....org>,
        Håkon Bugge <haakon.bugge@...cle.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Leon Romanovsky <leon@...nel.org>,
        Parav Pandit <parav@...lanox.com>,
        Steve Wise <swise@...ngridcomputing.com>
Cc:     linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] RDMA/cma: Make CM response timeout and # CM retries
 configurable

On Thu, 2019-06-13 at 08:28 -0700, Bart Van Assche wrote:
> On 6/13/19 7:25 AM, Doug Ledford wrote:
> > So, to revive this patch, what I'd like to see is some attempt to
> > actually quantify a reasonable timeout for the default backlog
> > depth,
> > then the patch should actually change the default to that
> > reasonable
> > timeout, and then put in the ability to adjust the timeout with
> > some
> > sort of doc guidance on how to calculate a reasonable timeout based
> > on
> > configured backlog depth.
> 
> How about following the approach of the SRP initiator driver? It
> derives 
> the CM timeout from the subnet manager timeout. The assumption
> behind 
> this is that in large networks the subnet manager timeout has to be
> set 
> higher than its default to make communication work. See also 
> srp_get_subnet_timeout().

Theoretically, the subnet manager needs a longer timeout in a bigger
network because it's handling more data as a single point of lookup for
the entire subnet.  Individual machines, on the other hand, have the
same backlog size (by default) regardless of the size of the network,
and there is no guarantee that if the admin increased the subnet
manager timeout, that they also increased the backlog queue depth size.
So, while I like things that auto-tune like you are suggesting, the
problem is that the one item does not directly correlate with the
other.

-- 
Doug Ledford <dledford@...hat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
2FDD

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)