[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YXgMK2NKiiVYJhLl@shredder>
Date: Tue, 26 Oct 2021 17:09:47 +0300
From: Ido Schimmel <idosch@...sch.org>
To: Leon Romanovsky <leon@...nel.org>
Cc: "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Ido Schimmel <idosch@...lanox.com>,
Jiri Pirko <jiri@...lanox.com>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org,
syzbot+93d5accfaefceedf43c1@...kaller.appspotmail.com
Subject: Re: [PATCH net-next] netdevsim: Register and unregister devlink
traps on probe/remove device
On Tue, Oct 26, 2021 at 10:18:12AM +0300, Leon Romanovsky wrote:
> On Tue, Oct 26, 2021 at 09:51:13AM +0300, Ido Schimmel wrote:
>
> <...>
>
> > >
> > > Can you please explain why is it so important to touch devlink SW
> > > objects, reallocate them again and again on every reload in mlxsw?
> >
> > Because that's how reload was defined and implemented. A complete
> > reload. We are not changing the semantics 4 years later.
>
> Please put your emotions aside and explain me technically why are you
> must to do it?
Already did. The current semantics are "devlink-reload provides
mechanism to reinit driver entities, applying devlink-params and
devlink-resources new values. It also provides mechanism to activate
firmware."
And this is exactly what netdevsim and mlxsw are doing. Driver entities
are re-initialized. Your patch breaks that as entities are not
re-initialized, which results in user space breakage. You simply cannot
introduce such regressions.
>
> The proposed semantics was broken for last 4 years, it can even seen as
> dead on arrival,
Again with the bombastic statements. It was "dead on arrival" like the
notifications were "impossible"?
> because it never worked for us in real production.
Who is "us"? mlx5 that apparently decided to do its own thing?
We are using reload in mlxsw on a daily basis and users are using it to
re-partition ASIC resources and activate firmware. There are tests over
netdevsim implementation that anyone can run for testing purposes. We
also made sure to integrate it into syzkaller:
https://github.com/google/syzkaller/commit/5b49e1f605a770e8f8fcdcbd1a8ff85591fc0c8e
https://github.com/google/syzkaller/commit/04ca72cd45348daab9d896bbec8ea4c2d13455ac
https://github.com/google/syzkaller/commit/6930bbef3b671ae21f74007f9e59efb9b236b93f
https://github.com/google/syzkaller/commit/d45a4d69d83f40579e74fb561e1583db1be0e294
https://github.com/google/syzkaller/commit/510951950dc0ee69cfdaf746061d3dbe31b49fd8
Which is why the regressions you introduced were discovered so quickly.
>
> So I'm fixing bugs without relation to when they were introduced.
We all do
>
> For example, this fix from Jiri [1] for basic design flow was merged almost
> two years later after devlink reload was introduced [2], or this patch from
> Parav [3] that fixed an issue introduced year before [4].
What is your point? That code has bugs?
By now I have spent more time arguing with you than you spent testing
your patches and it's clear this discussion is not going anywhere.
Are you going to send a revert or I will? This is the fourth time I'm
asking you.
Powered by blists - more mailing lists