[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZSvJBV2dWf1dTlWp@nanopsycho>
Date: Sun, 15 Oct 2023 13:12:05 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, pabeni@...hat.com, davem@...emloft.net,
edumazet@...gle.com, gal@...dia.com
Subject: Re: [patch net-next] devlink: don't take instance lock for nested
handle put
Fri, Oct 13, 2023 at 10:01:00PM CEST, kuba@...nel.org wrote:
>On Fri, 13 Oct 2023 19:07:05 +0200 Jiri Pirko wrote:
>> >> Not sure what obvious bug you mean. If you mean the parent-child
>> >> lifetime change, I don't know how that would help here. I don't see how.
>> >>
>> >> Plus it has performance implications. When user removes SF port under
>> >> instance lock, the SF itself is removed asynchonously out of the lock.
>> >> You suggest to remove it synchronously holding the instance lock,
>> >> correct?
>> >
>> >The SF is deleted by calling ->port_del() on the PF instance, correct?
>>
>> That or setting opstate "inactive".
>
>The opstate also set on the port (i.e. from the PF), right?
Correct. But the problem is elsewehere. The actual SF auxdev lifecycle,
see below.
>
>> >> SF removal does not need that lock. Removing thousands of SFs
>> >> would take much longer as currently, they are removed in parallel.
>> >> You would serialize the removals for no good reason.
>> >
>> >First of all IDK what the removal rate you're targeting is, and what
>> >is achievable under PF's lock. Handwaving "we need parallelism" without
>> >data is not a serious argument.
>>
>> Oh there are data and there is a need. My colleagues are working
>> on parallel creation/removal within mlx5 driver as we speak. What you
>> suggest would be huge setback :/
>
>The only part that needs to be synchronous is un-linking.
>Once the SF is designated for destruction we can live without the link,
>it's just waiting to be garbage-collected.
Yeah. Another flow to consider is SF unbind. When user unbinds the SF
manually, PF lock is not involved in at all (can't be really). SF needs
to be unlisted as well as the SF devlink instance goes away. That was
another reason for the rel infrastructure.
>
>> >> Not sure what you mean by that. Locking is quite clear. Why weird?
>> >> What's weird exactly? What do you mean by "random dependencies"?
>> >>
>> >> I have to say I feel we got a bit lost in the conversation.
>> >
>> >You have a rel object, which is refcounted, xarray with a lock, and
>> >an async work for notifications.
>>
>> Yes. The async work for notification is something you would need anyway,
>> even with object lifetime change you suggest. It's about locking order.
>
>I don't think I would. If linking is always done under PF's lock we can
>safely send any ntf.
If is not. Manual bind called over sysfs of the SF auxiliary device is
called without any lock held.
There are multiple race conditions to consider the probing/removing the
SF auxiliary device in parallel operations like PF devlink reload, port
funcion removal etc. Rel infra is considering and resolving them all.
>
>> Please see the patchset I sent today (v3), I did put in a documentation
>> describing that (3 last patches). That should make it clear.
>
>It's unnecessarily complicated, but whatever, I'm not touching it.
Powered by blists - more mailing lists