lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1759bca6-4511-6cd9-ab5d-8c9c30e5db67@oracle.com>
Date:   Tue, 2 Jul 2019 14:18:14 -0700
From:   santosh.shilimkar@...cle.com
To:     Gerd Rausch <gerd.rausch@...cle.com>, netdev@...r.kernel.org
Cc:     David Miller <davem@...emloft.net>
Subject: Re: [PATCH net-next 3/7] net/rds: Wait for the FRMR_IS_FREE (or
 FRMR_IS_STALE) transition after posting IB_WR_LOCAL_INV

On 7/2/19 2:05 PM, Gerd Rausch wrote:
> On 02/07/2019 09.49, santosh.shilimkar@...cle.com wrote:
>> On 7/1/19 10:11 PM, Gerd Rausch wrote:
>>> For the registration work-requests there is a benefit to wait a short
>>> amount of time only (the trade-off described in patch #1 of this series).
>>>
>> Actually we should just switch this code to what Avinash has
>> finally made in downstream code. That keeps the RDS_GET_MR
>> semantics and makes sure MR is really valid before handing over
>> the key to userland. There is no need for any timeout
>> for registration case.
>>
> 
> What do you call "RDS_GET_MR" semantics?
> 
Its a blocking socket call. Meaning after this call return to the
user, the key must be valid. With async registration that can't be
guaranteed.

> The purpose of waiting for a IB_WR_REG_MR request to complete
> (inside rds_ib_post_reg_frmr) is in fact to make sure
> the memory region is valid.
> 
> Regardless of this being true after a specific time-out,
> or an infinite timeout.
> 
> For the non-infinite time-out case, there is a check if the request
> was handled by the firmware.
> 
> And if a time-out occurred and the firmware didn't handle the request,
> function "rds_ib_post_reg_frmr" will return -EBUSY.
> 
>>> Actually, no:
>>> Socket option RDS_GET_MR wasn't even in the code-path of the
>>> tests I performed:
>>>
>>> It were there RDS_CMSG_RDMA_MAP / RDS_CMSG_RDMA_DEST control
>>> messages that ended up calling '__rds_rdma_map".
>>>
>> What option did you use ? Default option with rds-stress is
>> RDS_GET_MR and hence the question.
>>
> 
> Not true!:
Its other way round. Thanks for info so default its using inline
registration instead of explicit call.

> How is socket option RDS_GET_MR special with regards to this proposed fix?
> 
>>> I don't understand, please elaborate:
>>> a) Are you saying this issue should not be fixed?
>>> b) Or are you suggesting to replace this fix with a different fix?
>>>      If it's the later, please point out what you have in mind.
>>> c) ???
>>>
>> All am saying is the code got changed for good reason and that changed
>> code makes some of these race conditions possibly not applicable.
> 
> I don't understand this. Please elaborate.
> 
>> So instead of these timeout fixes, am suggesting to use that
>> code as fix. At least test it with those changes and see whats
>> the behavior.
>>
> 
> Are you suggesting to
> a) Not fix this bug right now and wait until some later point in time
When did I say that ? I said have you explored alternate approach to
fix the issue and if not could you try it out.

> b) Use a different fix. If you've got a different fix, please share.
>
I don't but its a review of the fix and possible alternate needs to
be discussed. It is not like take my fix or provide an alternate fix.

> And besides these options, is there anything wrong with this fix
> (other than the discussion of what the timeout value ought to be,
>   which we can address)?
> 
That timeout is a problem because it doesn't guarantee the failure
of operation since its an asyn operation for registration. Instead
of timing out if you poll the CQ for that operation completion, it
makes it full proof. That is the change Avinash has done iirc and
am requesting to look at that fix.

Other 5 fixes from the series looks fine.

Regards,
Santosh

Regards,
Santosh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ