lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Jun 2024 05:50:15 +0000
From: Omer Shpigelman <oshpigelman@...ana.ai>
To: Leon Romanovsky <leon@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "ogabbay@...nel.org" <ogabbay@...nel.org>,
        Zvika Yehudai <zyehudai@...ana.ai>
Subject: Re: [PATCH 04/15] net: hbl_cn: QP state machine

On 6/17/24 16:18, Leon Romanovsky wrote:
> [Some people who received this message don't often get email from leon@...nel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
>> Add a common QP state machine which handles the moving for a QP from one
>> state to another including performing necessary checks, draining
>> in-flight transactions, invalidating caches and error reporting.
>>
>> Signed-off-by: Omer Shpigelman <oshpigelman@...ana.ai>
>> Co-developed-by: Abhilash K V <kvabhilash@...ana.ai>
>> Signed-off-by: Abhilash K V <kvabhilash@...ana.ai>
>> Co-developed-by: Andrey Agranovich <aagranovich@...ana.ai>
>> Signed-off-by: Andrey Agranovich <aagranovich@...ana.ai>
>> Co-developed-by: Bharat Jauhari <bjauhari@...ana.ai>
>> Signed-off-by: Bharat Jauhari <bjauhari@...ana.ai>
>> Co-developed-by: David Meriin <dmeriin@...ana.ai>
>> Signed-off-by: David Meriin <dmeriin@...ana.ai>
>> Co-developed-by: Sagiv Ozeri <sozeri@...ana.ai>
>> Signed-off-by: Sagiv Ozeri <sozeri@...ana.ai>
>> Co-developed-by: Zvika Yehudai <zyehudai@...ana.ai>
>> Signed-off-by: Zvika Yehudai <zyehudai@...ana.ai>
>> ---
>>  .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c  | 480 +++++++++++++++++-
>>  1 file changed, 479 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>> index 9ddc23bf8194..26ebdf448193 100644
>> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>> @@ -6,8 +6,486 @@
> 
> <...>
> 
>> +/* The following table represents the (valid) operations that can be performed on
>> + * a QP in order to move it from one state to another
>> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
>> + * operation.
>> + */
>> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
>> +     [CN_QP_STATE_RESET] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_INIT]      = CN_QP_OP_RST_2INIT,
>> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
>> +             [CN_QP_STATE_QPD]       = CN_QP_OP_NOP,
>> +     },
>> +     [CN_QP_STATE_INIT] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
>> +             [CN_QP_STATE_INIT]      = CN_QP_OP_NOP,
>> +             [CN_QP_STATE_RTR]       = CN_QP_OP_INIT_2RTR,
>> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
>> +             [CN_QP_STATE_QPD]       = CN_QP_OP_NOP,
>> +     },
>> +     [CN_QP_STATE_RTR] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
>> +             [CN_QP_STATE_RTR]       = CN_QP_OP_RTR_2RTR,
>> +             [CN_QP_STATE_RTS]       = CN_QP_OP_RTR_2RTS,
>> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
>> +             [CN_QP_STATE_QPD]       = CN_QP_OP_RTR_2QPD,
>> +     },
>> +     [CN_QP_STATE_RTS] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
>> +             [CN_QP_STATE_RTS]       = CN_QP_OP_RTS_2RTS,
>> +             [CN_QP_STATE_SQD]       = CN_QP_OP_RTS_2SQD,
>> +             [CN_QP_STATE_QPD]       = CN_QP_OP_RTS_2QPD,
>> +             [CN_QP_STATE_SQERR]     = CN_QP_OP_RTS_2SQERR,
>> +     },
>> +     [CN_QP_STATE_SQD] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
>> +             [CN_QP_STATE_SQD]       = CN_QP_OP_SQD_2SQD,
>> +             [CN_QP_STATE_RTS]       = CN_QP_OP_SQD_2RTS,
>> +             [CN_QP_STATE_QPD]       = CN_QP_OP_SQD_2QPD,
>> +             [CN_QP_STATE_SQERR]     = CN_QP_OP_SQD_2SQ_ERR,
>> +     },
>> +     [CN_QP_STATE_QPD] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
>> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
>> +             [CN_QP_STATE_QPD]       = CN_QP_OP_NOP,
>> +             [CN_QP_STATE_RTR]       = CN_QP_OP_QPD_2RTR,
>> +     },
>> +     [CN_QP_STATE_SQERR] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
>> +             [CN_QP_STATE_SQD]       = CN_QP_OP_SQ_ERR_2SQD,
>> +             [CN_QP_STATE_SQERR]     = CN_QP_OP_NOP,
>> +     },
>> +     [CN_QP_STATE_ERR] = {
>> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
>> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
>> +     }
>> +};
> 
> I don't understand why IBTA QP state machine is declared in ETH driver
> and not in IB driver.
> 

Implementing the actual transitions between the states requires full
knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
generations are planned to be added and the IB driver should not be aware
of these additional HWs.
Hence we implemeted the QP state machine in the CN driver which is aware
of the actual HW.

>> +
> 
> <...>
> 
>> +             /* Release lock while we wait before retry.
>> +              * Note, we can assert that we are already locked.
>> +              */
>> +             port_funcs->cfg_unlock(cn_port);
>> +
>> +             msleep(20);
>> +
>> +             port_funcs->cfg_lock(cn_port);
> 
> lock/unlock through ops pointer doesn't look like a good idea.
> 

More ASIC generations will be added once we merge the current Gaudi2 code.
On other ASICs the locking granularity is different because some of the HW
resources are shared between different logical ports.
Hence it is was logical for us to implement it with a function pointer so
each ASIC specific code can implemnet the locking properly.

> Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ