lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79d25e7c-ad9e-f6d8-b0fe-4ce04c658e1e@oracle.com>
Date:   Mon, 1 Jul 2019 19:28:44 -0700
From:   santosh.shilimkar@...cle.com
To:     Gerd Rausch <gerd.rausch@...cle.com>, netdev@...r.kernel.org
Cc:     David Miller <davem@...emloft.net>
Subject: Re: [PATCH net-next 3/7] net/rds: Wait for the FRMR_IS_FREE (or
 FRMR_IS_STALE) transition after posting IB_WR_LOCAL_INV



On 7/1/19 2:06 PM, Gerd Rausch wrote:
> Hi Santosh,
> 
> On 01/07/2019 14.00, santosh.shilimkar@...cle.com wrote:
>>>
>> Look for command timeout in CX3 sources. 60 second is upper bound in
>> CX3. Its not standard in specs(at least not that I know) though
>> and may vary from vendor to vendor.
>>
> 
> I am not seeing it. Can you point me to the right place?
>
Below. All command timeouts are 60 seconds.

enum {
         MLX4_CMD_TIME_CLASS_A   = 60000,
         MLX4_CMD_TIME_CLASS_B   = 60000,
         MLX4_CMD_TIME_CLASS_C   = 60000,
};

But having said that, I re-looked the code you are patching
and thats actually only FRWR code which is purely work-request
based so this command timeout shouldn't matter.

If the work request fails, then it will lead to flush errors and
MRs will be marked as STALE. So this wait may not be necessary

There is a socket call RDS_GET_MR which needs to be synchronous
and that Avinash has actually fixed by making this MR registration
processes synchronous. Inline registration is still kept async.
RDS_GET_MR case is what actually showing the issue you saw
and the fix for that Avinash has it in production kernel.

I believe with that change, registration issue becomes non-issue
already.

And as far as invalidation concerned with proxy qp, it not longer
races with data path qp.

May be you can try those changes if not already to see if it
addresses the couple of cases where you ended up adding
timeouts.

Regards,
Santosh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ