netdev - Re: [PATCH v3 net 5/6] mpls: Handle error of rtnl_register

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8734l7d0a4.fsf@email.froward.int.ebiederm.org>
Date: Mon, 07 Oct 2024 17:18:59 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: <davem@...emloft.net>,  <edumazet@...gle.com>,  <kuba@...nel.org>,
  <kuni1840@...il.com>,  <netdev@...r.kernel.org>,  <pabeni@...hat.com>
Subject: Re: [PATCH v3 net 5/6] mpls: Handle error of rtnl_register_module().

Kuniyuki Iwashima <kuniyu@...zon.com> writes:

> From: "Eric W. Biederman" <ebiederm@...ssion.com>
> Date: Mon, 07 Oct 2024 11:28:11 -0500
>> Kuniyuki Iwashima <kuniyu@...zon.com> writes:
>> 
>> > From: "Eric W. Biederman" <ebiederm@...ssion.com>
>> > Date: Mon, 07 Oct 2024 09:56:44 -0500
>> >> Kuniyuki Iwashima <kuniyu@...zon.com> writes:
>> >> 
>> >> > Since introduced, mpls_init() has been ignoring the returned
>> >> > value of rtnl_register_module(), which could fail.
>> >> 
>> >> As I recall that was deliberate.  The module continues to work if the
>> >> rtnetlink handlers don't operate, just some functionality is lost.
>> >
>> > It's ok if it wasn't a module.  rtnl_register() logs an error message
>> > in syslog, but rtnl_register_module() doesn't.  That's why this series
>> > only changes some rtnl_register_module() calls.
>> 
>> You talk about the series.  Is there an introductory letter I should
>> lookup on netdev that explains things in more detail?
>> 
>> I have only seen the patch that is sent directly to me.
>
> Some context here.
> https://lore.kernel.org/netdev/20241007124459.5727-1-kuniyu@amazon.com/
>
> Before addf9b90de22, rtnl_register_module() didn't actually need
> error handling for some callers, but even after the commit, some
> modules copy-and-pasted the wrong code.

That is wrong. As far back as commit e284986385b6 ("[RTNL]: Message
handler registration interface") it was possible for rtnl_register to
error.  Of course that dates back to the time when small allocations
were guaranteed to succeed or panic the kernel.  So I expect it was
the change to small allocations that actually made it possible to
see failures from rtnl_register.

That said there is a difference between code that generates an error,
and callers that need to handle the error.  Even today I do not
see that MPLS needs to handle the error.

Other than for policy objectives such as making it harder for
syzkaller to get confused, and making the code more like other
callers I don't see a need for handling such an error in MPLS.

>> > What if the memory pressure happend to be relaxed soon after the module
>> > was loaded incompletely ?
>> 
>> Huh?  The module will load completely.  It will just won't have full
>> functionality.  The rtnetlink functionality simply won't work.
>> 
>> > Silent failure is much worse to me.
>> 
>> My point is from the point of view of the MPLS functionality it isn't
>> a __failure__.
>
> My point is it _is_ failure for those who use MPLS rtnetlink
> functionality, it's uAPI.  Not everyone uses the plain MPLS
> without rtnetlink.

No matter what the code does userspace attempting to use rtnetlink to
configure mpls will fail.  Either the MPLS module will fail to load, and
nothing works, or the module will load and do the best it can with what
it has.  Allowing the folks you don't need the rtnetlink uAPI to
succeed.

It isn't a uAPI failure.  It is a lack of uAPI availability.  There
is nothing else it can be.

In general the kernel tries to limp along the best it can without
completely failing.

> Also, I don't want to waste resource due to such an issue on QEMU w/ 2GB
> RAM where I'm running syzkaller and often see OOM.  syzkaller searches
> and loads modules and exercises various uAPIs randomly, and it handles
> module-load-failure properly.

Then please mention that use case in your change description.  That is
the only real motivation I see to for this change in behavior. Perhaps
something like:

    Handler errors from rtnetlink registration allowing syzkaller to view a
    module as an all or nothing thing in terms of the functionality it
    provides.  This prevents syzkaller from reporting spurious errors
    from it's tests.

That seems like a fine motivation and a fine reason.  But if you don't
say that, people will make changes that don't honor that, and people
won't be able to look into the history and figure out why such a change
was made.

Recording the real reasons for changes in the history really matters.
Because sometimes people need to revisit things, and if you don't
include those reasons people don't have a chance of taking your reaons
into account when the revisit a change for another set of reasons.
Unless somehow someone remembers something when it comes to code review
time.


>> The flip side is I tried very hard to minimize the amount of code in
>> af_mpls, to make maintenance simpler, and to reduce the chance of bugs.
>> You are busy introducing what appears to me to be an untested error
>> handling path which may result in something worse that not logging a
>> message.  Especially when someone comes along and makes another change.
>> 
>> It is all such a corner case and I really don't care, but I just don't
>> want this to be seen as a change that is obviously the right way to go
>> and that has no downside.
>
> I don't see how this small change has downside that affects maintenability.
> Someone who wants to add a new code there can just add a new function call
> and goto label, that's simple enough.

Strictly speaking when testing code both branches of every if statement
need to be tested.  It is the untested code where mistakes slip in.
People being human we don't get it right 100% of the time.

It isn't a large maintenance cost increase, but it is a maintenance cost
increase.

One if statement isn't a big deal but at the end of the day it adds up.

If code becomes simpler and there is noticeably less for it to do the
code winds up having measurably fewer bugs.  If code becomes more
complex, with more cases to handle and more branches after a time code
winds up having measurably more bugs.

When I look at any part of the linux kernel with which I am familiar I
can very much guarantee that most of it has become more complex over
time, leading to measurably more bugs.   Tools like syzkaller help, but
that don't completely compensate for more complex code.



In this case I expect in balance fewer spurious errors from syzkall
is seems like it would be a net win.  But I most definitely see a
tradeoff being made here.

Eric