netdev - Re: [PATCH net-next 3/7] net/rds: Wait for the FRMR_IS_FREE (or FRMR_IS_STALE) transition after posting IB_WR_LOCAL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <697adfba-ac8b-db4d-5819-f4c22ec6c76a@oracle.com>
Date:   Tue, 2 Jul 2019 14:05:59 -0700
From:   Gerd Rausch <gerd.rausch@...cle.com>
To:     santosh.shilimkar@...cle.com, netdev@...r.kernel.org
Cc:     David Miller <davem@...emloft.net>
Subject: Re: [PATCH net-next 3/7] net/rds: Wait for the FRMR_IS_FREE (or
 FRMR_IS_STALE) transition after posting IB_WR_LOCAL_INV

On 02/07/2019 09.49, santosh.shilimkar@...cle.com wrote:
> On 7/1/19 10:11 PM, Gerd Rausch wrote:
>> For the registration work-requests there is a benefit to wait a short
>> amount of time only (the trade-off described in patch #1 of this series).
>>
> Actually we should just switch this code to what Avinash has
> finally made in downstream code. That keeps the RDS_GET_MR
> semantics and makes sure MR is really valid before handing over
> the key to userland. There is no need for any timeout
> for registration case.
> 

What do you call "RDS_GET_MR" semantics?

The purpose of waiting for a IB_WR_REG_MR request to complete
(inside rds_ib_post_reg_frmr) is in fact to make sure
the memory region is valid.

Regardless of this being true after a specific time-out,
or an infinite timeout.

For the non-infinite time-out case, there is a check if the request
was handled by the firmware.

And if a time-out occurred and the firmware didn't handle the request,
function "rds_ib_post_reg_frmr" will return -EBUSY.

>> Actually, no:
>> Socket option RDS_GET_MR wasn't even in the code-path of the
>> tests I performed:
>>
>> It were there RDS_CMSG_RDMA_MAP / RDS_CMSG_RDMA_DEST control
>> messages that ended up calling '__rds_rdma_map".
>>
> What option did you use ? Default option with rds-stress is
> RDS_GET_MR and hence the question.
> 

Not true!:

Socket option RDS_GET_MR is only used by "rds-stress"
if it is invoked with "--rdma-use-get-mr <non-zero-value>".

Verify this with the following patch applied to rds-tools-2.0.7-1.18:
--- a/rds-stress.c
+++ b/rds-stress.c
@@ -705,6 +705,7 @@ static uint64_t get_rdma_key(int fd, uint64_t addr, uint32_t size)
        if (opt.rdma_use_once)
                mr_args.flags |= RDS_RDMA_USE_ONCE;

+       abort();
        if (setsockopt(fd, sol, RDS_GET_MR, &mr_args, sizeof(mr_args)))
                die_errno("setsockopt(RDS_GET_MR) failed (%u allocated)", mrs_allocated);

And why is socket option RDS_GET_MR a subject of this discussion?

The proposed patch suggests to wait for a firmware response in handling
_all_ cases that end up in "rds_ib_post_reg_frmr", issuing a "IB_WR_REG_MR" request.

It doesn't matter where they came from:
Whether they came from an RDS_GET_MR socket option, or a  RDS_CMSG_RDMA_MAP:
They all go through "__rds_rdma_map" and end up calling "rds_ib_get_mr".

How is socket option RDS_GET_MR special with regards to this proposed fix?

>> I don't understand, please elaborate:
>> a) Are you saying this issue should not be fixed?
>> b) Or are you suggesting to replace this fix with a different fix?
>>     If it's the later, please point out what you have in mind.
>> c) ???
>>
> All am saying is the code got changed for good reason and that changed
> code makes some of these race conditions possibly not applicable.

I don't understand this. Please elaborate.

> So instead of these timeout fixes, am suggesting to use that
> code as fix. At least test it with those changes and see whats
> the behavior.
> 

Are you suggesting to
a) Not fix this bug right now and wait until some later point in time
b) Use a different fix. If you've got a different fix, please share.

And besides these options, is there anything wrong with this fix
(other than the discussion of what the timeout value ought to be,
 which we can address)?

Thanks,

  Gerd