[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95d566af-30dc-fecc-9a1b-3c8c7d69b880@oracle.com>
Date: Mon, 1 Jul 2019 13:53:23 -0700
From: santosh.shilimkar@...cle.com
To: Gerd Rausch <gerd.rausch@...cle.com>,
David Miller <davem@...emloft.net>
Cc: netdev@...r.kernel.org
Subject: Re: [PATCH net-next 1/7] net/rds: Give fr_state a chance to
transition to FRMR_IS_FREE
On 7/1/19 1:50 PM, Gerd Rausch wrote:
> Hi David,
>
> On 01/07/2019 11.27, David Miller wrote:
>> From: Gerd Rausch <gerd.rausch@...cle.com>
>> Date: Mon, 1 Jul 2019 09:39:44 -0700
>>
>>> + /* Memory regions make it onto the "clean_list" via
>>> + * "rds_ib_flush_mr_pool", after the memory region has
>>> + * been posted for invalidation via "rds_ib_post_inv".
>>> + *
>>> + * At that point in time, "fr_state" may still be
>>> + * in state "FRMR_IS_INUSE", since the only place where
>>> + * "fr_state" transitions to "FRMR_IS_FREE" is in
>>> + * is in "rds_ib_mr_cqe_handler", which is
>>> + * triggered by a tasklet.
>>> + *
>>> + * So in case we notice that
>>> + * "fr_state != FRMR_IS_FREE" (see below), * we wait for
>>> + * "fr_inv_done" to trigger with a maximum of 10msec.
>>> + * Then we check again, and only put the memory region
>>> + * onto the drop_list (via "rds_ib_free_frmr")
>>> + * in case the situation remains unchanged.
>>> + *
>>> + * This avoids the problem of memory-regions bouncing
>>> + * between "clean_list" and "drop_list" before they
>>> + * even have a chance to be properly invalidated.
>>> + */
>>> + frmr = &ibmr->u.frmr;
>>> + wait_event_timeout(frmr->fr_inv_done,
>>> + frmr->fr_state == FRMR_IS_FREE,
>>> + msecs_to_jiffies(10));
>>> + if (frmr->fr_state == FRMR_IS_FREE)
>>> + break;
>>
>> If we see FRMR_IS_FREE after the timeout, what cleans this up?
>>
>
> In that case, the memory-region is subjected to the
> "rds_ib_free_frmr(ibmr, true)" call that follows:
> In essence making it onto the "drop_list".
>
> It's the same as if it wouldn't transition to FRMR_IS_FREE at all.
> In both cases, the memory region should get dropped, and the application
> would have been penalized by an extra 10msec wait-time (for having tried to invalidate it).
>
>> Also, why 10msec?
>
> It's empirical.
> I had added some debugging code (not part of this submission) that traced
> the return value of "wait_event_timeout" in order to see the out-lier in terms
> of processing the "IB_WR_LOCAL_INV" request.
>
> On my test-systems the majority of requests were done in less than 1msec.
> I saw an outlier at almost 2msec once.
> So I gave it an extra order-of-magnitude multiplier for extra buffer / paranoia.
>
>> Why that specific value and not some other value?
>
> I looked around to find what Mellanox or any other reference material had
> so say about the expected turn--around time of an "IB_WR_LOCAL_INV" ought to be.
> I wasn't able to find any.
>
LOCAL_INV/REG etc are all end being HCA commands and the command
timeouts are large. 60 seconds is what CX3 HCA has for example.
Thats the worst case timeout from HCA before marking the command
to be timeout and hence the operation to be failed.
Regards,
Santosh
Powered by blists - more mailing lists