linux-kernel - Re: [PATCH net-next] netdevsim: Register and unregister devlink traps on probe/remove device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YXgMK2NKiiVYJhLl@shredder>
Date:   Tue, 26 Oct 2021 17:09:47 +0300
From:   Ido Schimmel <idosch@...sch.org>
To:     Leon Romanovsky <leon@...nel.org>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Ido Schimmel <idosch@...lanox.com>,
        Jiri Pirko <jiri@...lanox.com>, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org,
        syzbot+93d5accfaefceedf43c1@...kaller.appspotmail.com
Subject: Re: [PATCH net-next] netdevsim: Register and unregister devlink
 traps on probe/remove device

On Tue, Oct 26, 2021 at 10:18:12AM +0300, Leon Romanovsky wrote:
> On Tue, Oct 26, 2021 at 09:51:13AM +0300, Ido Schimmel wrote:
> 
> <...>
> 
> > > 
> > > Can you please explain why is it so important to touch devlink SW
> > > objects, reallocate them again and again on every reload in mlxsw?
> > 
> > Because that's how reload was defined and implemented. A complete
> > reload. We are not changing the semantics 4 years later.
> 
> Please put your emotions aside and explain me technically why are you
> must to do it?

Already did. The current semantics are "devlink-reload provides
mechanism to reinit driver entities, applying devlink-params and
devlink-resources new values. It also provides mechanism to activate
firmware."

And this is exactly what netdevsim and mlxsw are doing. Driver entities
are re-initialized. Your patch breaks that as entities are not
re-initialized, which results in user space breakage. You simply cannot
introduce such regressions.

> 
> The proposed semantics was broken for last 4 years, it can even seen as
> dead on arrival,

Again with the bombastic statements. It was "dead on arrival" like the
notifications were "impossible"?

> because it never worked for us in real production.

Who is "us"? mlx5 that apparently decided to do its own thing?

We are using reload in mlxsw on a daily basis and users are using it to
re-partition ASIC resources and activate firmware. There are tests over
netdevsim implementation that anyone can run for testing purposes. We
also made sure to integrate it into syzkaller:

https://github.com/google/syzkaller/commit/5b49e1f605a770e8f8fcdcbd1a8ff85591fc0c8e
https://github.com/google/syzkaller/commit/04ca72cd45348daab9d896bbec8ea4c2d13455ac
https://github.com/google/syzkaller/commit/6930bbef3b671ae21f74007f9e59efb9b236b93f
https://github.com/google/syzkaller/commit/d45a4d69d83f40579e74fb561e1583db1be0e294
https://github.com/google/syzkaller/commit/510951950dc0ee69cfdaf746061d3dbe31b49fd8

Which is why the regressions you introduced were discovered so quickly.

> 
> So I'm fixing bugs without relation to when they were introduced.

We all do

> 
> For example, this fix from Jiri [1] for basic design flow was merged almost
> two years later after devlink reload was introduced [2], or this patch from
> Parav [3] that fixed an issue introduced year before [4].

What is your point? That code has bugs?

By now I have spent more time arguing with you than you spent testing
your patches and it's clear this discussion is not going anywhere.

Are you going to send a revert or I will? This is the fourth time I'm
asking you.