lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZSvJBV2dWf1dTlWp@nanopsycho>
Date: Sun, 15 Oct 2023 13:12:05 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, pabeni@...hat.com, davem@...emloft.net,
	edumazet@...gle.com, gal@...dia.com
Subject: Re: [patch net-next] devlink: don't take instance lock for nested
 handle put

Fri, Oct 13, 2023 at 10:01:00PM CEST, kuba@...nel.org wrote:
>On Fri, 13 Oct 2023 19:07:05 +0200 Jiri Pirko wrote:
>> >> Not sure what obvious bug you mean. If you mean the parent-child
>> >> lifetime change, I don't know how that would help here. I don't see how.
>> >> 
>> >> Plus it has performance implications. When user removes SF port under
>> >> instance lock, the SF itself is removed asynchonously out of the lock.
>> >> You suggest to remove it synchronously holding the instance lock,
>> >> correct?   
>> >
>> >The SF is deleted by calling ->port_del() on the PF instance, correct?  
>> 
>> That or setting opstate "inactive".
>
>The opstate also set on the port (i.e. from the PF), right?

Correct. But the problem is elsewehere. The actual SF auxdev lifecycle,
see below.


>
>> >> SF removal does not need that lock. Removing thousands of SFs
>> >> would take much longer as currently, they are removed in parallel.
>> >> You would serialize the removals for no good reason.  
>> >
>> >First of all IDK what the removal rate you're targeting is, and what
>> >is achievable under PF's lock. Handwaving "we need parallelism" without
>> >data is not a serious argument.  
>> 
>> Oh there are data and there is a need. My colleagues are working
>> on parallel creation/removal within mlx5 driver as we speak. What you
>> suggest would be huge setback :/
>
>The only part that needs to be synchronous is un-linking.
>Once the SF is designated for destruction we can live without the link,
>it's just waiting to be garbage-collected.

Yeah. Another flow to consider is SF unbind. When user unbinds the SF
manually, PF lock is not involved in at all (can't be really). SF needs
to be unlisted as well as the SF devlink instance goes away. That was
another reason for the rel infrastructure.


>
>> >> Not sure what you mean by that. Locking is quite clear. Why weird?
>> >> What's weird exactly? What do you mean by "random dependencies"?
>> >> 
>> >> I have to say I feel we got a bit lost in the conversation.  
>> >
>> >You have a rel object, which is refcounted, xarray with a lock, and 
>> >an async work for notifications.  
>> 
>> Yes. The async work for notification is something you would need anyway,
>> even with object lifetime change you suggest. It's about locking order.
>
>I don't think I would. If linking is always done under PF's lock we can
>safely send any ntf.

If is not. Manual bind called over sysfs of the SF auxiliary device is
called without any lock held.

There are multiple race conditions to consider the probing/removing the
SF auxiliary device in parallel operations like PF devlink reload, port
funcion removal etc. Rel infra is considering and resolving them all.


>
>> Please see the patchset I sent today (v3), I did put in a documentation
>> describing that (3 last patches). That should make it clear.
>
>It's unnecessarily complicated, but whatever, I'm not touching it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ