[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220408193035.2uplgfjnfjo4s4f2@sx1>
Date: Fri, 8 Apr 2022 12:30:35 -0700
From: Saeed Mahameed <saeedm@...dia.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: Saeed Mahameed <saeed@...nel.org>,
Jason Gunthorpe <jgg@...dia.com>,
Patrisious Haddad <phaddad@...dia.com>,
Jakub Kicinski <kuba@...nel.org>, linux-rdma@...r.kernel.org,
netdev@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>,
Yishai Hadas <yishaih@...dia.com>
Subject: Re: [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg
pointers post destruction
On 06 Apr 10:55, Leon Romanovsky wrote:
>On Tue, Apr 05, 2022 at 12:48:45PM -0700, Saeed Mahameed wrote:
>> On 05 Apr 11:12, Leon Romanovsky wrote:
>> > From: Patrisious Haddad <phaddad@...dia.com>
>> >
>> > Prior to this patch in the case that destroy_unmap_eq()
>> > failed and was called again, it triggered an additional call of
>>
>> Where is it being failed and called again ? this shouldn't even be an
>> option, we try to keep mlx5 symmetrical, constructors and destructors are
>> supposed to be called only once in their respective positions.
>> the callers must be fixed to avoid re-entry, or change destructors to clear
>> up all resources even on failures, no matter what do not invent a reentry
>> protocols to mlx5 destructors.
>
>It can happen when QP is exposed through DEVX interface. In that flow,
>only FW knows about it and reference count all users. This means that
>attempt to destroy such QP will fail, but mlx5_core is structured in
>such way that all cleanup was done before calling to FW to get
>success/fail response.
I wasn't talking about destroy_qp, actually destroy_qp is implemented the
way i am asking you to implement destroy_eq(); remove debugfs on first call
to destroy EQ, and drop the reentry logic from from mlx5_eq_destroy_generic
and destroy_async_eq.
EQ is a core/mlx5_ib resources, it's not exposed to user nor DEVX, it
shouldn't be subject to DEVX limitations.
Also looking at the destroy_qp implementation, it removes the debugfs
unconditionally even if the QP has ref count and removal will fail in FW.
just FYI.
For EQ I don't even understand why devx can cause ODP EQ removal to fail..
you must fix this at mlx5_ib layer, but for this patch, please drop the
re-entry and remove debugfs in destroy_eq, unconditionally.
>
>For more detailed information, see this cover letter:
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20200907120921.476363-1-leon%40kernel.org%2F&data=04%7C01%7Csaeedm%40nvidia.com%7Cee8a0add0a154e055f8508da17a2d6fd%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848285407413801%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=xD54MMVFSeONDeQyPHOinh93CPWjp2rUEL7F3izc210%3D&reserved=0
>
><...>
>
>> > int mlx5_eq_destroy_generic(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
>> > {
>> > + struct mlx5_eq_table *eq_table = dev->priv.eq_table;
>> > int err;
>> >
>> > if (IS_ERR(eq))
>> > return -EINVAL;
>> >
>> > - err = destroy_async_eq(dev, eq);
>> > + mutex_lock(&eq_table->lock);
>>
>> Here you are inventing the re-entry. Please drop this and fix properly. And
>> avoid boolean parameters to mlx5 core
>> functions as much as possible, let's keep mlx5_core simple.
>
>If after reading the link above, you were not convinced, let's take it offline.
>
I am not convinced, see above.
>Thanks
Powered by blists - more mailing lists