[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4834749-4aa2-7e79-dbf8-004580364a39@oracle.com>
Date: Mon, 1 Jul 2019 13:50:18 -0700
From: Gerd Rausch <gerd.rausch@...cle.com>
To: David Miller <davem@...emloft.net>
Cc: santosh.shilimkar@...cle.com, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 1/7] net/rds: Give fr_state a chance to
transition to FRMR_IS_FREE
Hi David,
On 01/07/2019 11.27, David Miller wrote:
> From: Gerd Rausch <gerd.rausch@...cle.com>
> Date: Mon, 1 Jul 2019 09:39:44 -0700
>
>> + /* Memory regions make it onto the "clean_list" via
>> + * "rds_ib_flush_mr_pool", after the memory region has
>> + * been posted for invalidation via "rds_ib_post_inv".
>> + *
>> + * At that point in time, "fr_state" may still be
>> + * in state "FRMR_IS_INUSE", since the only place where
>> + * "fr_state" transitions to "FRMR_IS_FREE" is in
>> + * is in "rds_ib_mr_cqe_handler", which is
>> + * triggered by a tasklet.
>> + *
>> + * So in case we notice that
>> + * "fr_state != FRMR_IS_FREE" (see below), * we wait for
>> + * "fr_inv_done" to trigger with a maximum of 10msec.
>> + * Then we check again, and only put the memory region
>> + * onto the drop_list (via "rds_ib_free_frmr")
>> + * in case the situation remains unchanged.
>> + *
>> + * This avoids the problem of memory-regions bouncing
>> + * between "clean_list" and "drop_list" before they
>> + * even have a chance to be properly invalidated.
>> + */
>> + frmr = &ibmr->u.frmr;
>> + wait_event_timeout(frmr->fr_inv_done,
>> + frmr->fr_state == FRMR_IS_FREE,
>> + msecs_to_jiffies(10));
>> + if (frmr->fr_state == FRMR_IS_FREE)
>> + break;
>
> If we see FRMR_IS_FREE after the timeout, what cleans this up?
>
In that case, the memory-region is subjected to the
"rds_ib_free_frmr(ibmr, true)" call that follows:
In essence making it onto the "drop_list".
It's the same as if it wouldn't transition to FRMR_IS_FREE at all.
In both cases, the memory region should get dropped, and the application
would have been penalized by an extra 10msec wait-time (for having tried to invalidate it).
> Also, why 10msec?
It's empirical.
I had added some debugging code (not part of this submission) that traced
the return value of "wait_event_timeout" in order to see the out-lier in terms
of processing the "IB_WR_LOCAL_INV" request.
On my test-systems the majority of requests were done in less than 1msec.
I saw an outlier at almost 2msec once.
So I gave it an extra order-of-magnitude multiplier for extra buffer / paranoia.
> Why that specific value and not some other value?
I looked around to find what Mellanox or any other reference material had
so say about the expected turn--around time of an "IB_WR_LOCAL_INV" ought to be.
I wasn't able to find any.
Please note that even if there was an upper-bound specified, such as minutes:
It wouldn't necessarily be a good idea to penalize an application by wait-times
up to one minute, if the alternative is to just put this memory region on a
drop-list and pick another one (which is suggested here).
> Why not wait for however long it takes for the tasklet to run and clean it up?
Two reasons I can think of:
1) The penalty of long wait-times would go to the application
2) If there were a firmware-bug, the "wait_event" would not terminate
Thanks,
Gerd
Powered by blists - more mailing lists