netdev - Re: [PATCH v3 net 5/6] mpls: Handle error of rtnl_register

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241007182106.39342-1-kuniyu@amazon.com>
Date: Mon, 7 Oct 2024 11:21:06 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <ebiederm@...ssion.com>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
	<kuni1840@...il.com>, <kuniyu@...zon.com>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>
Subject: Re: [PATCH v3 net 5/6] mpls: Handle error of rtnl_register_module().

From: "Eric W. Biederman" <ebiederm@...ssion.com>
Date: Mon, 07 Oct 2024 11:28:11 -0500
> Kuniyuki Iwashima <kuniyu@...zon.com> writes:
> 
> > From: "Eric W. Biederman" <ebiederm@...ssion.com>
> > Date: Mon, 07 Oct 2024 09:56:44 -0500
> >> Kuniyuki Iwashima <kuniyu@...zon.com> writes:
> >> 
> >> > Since introduced, mpls_init() has been ignoring the returned
> >> > value of rtnl_register_module(), which could fail.
> >> 
> >> As I recall that was deliberate.  The module continues to work if the
> >> rtnetlink handlers don't operate, just some functionality is lost.
> >
> > It's ok if it wasn't a module.  rtnl_register() logs an error message
> > in syslog, but rtnl_register_module() doesn't.  That's why this series
> > only changes some rtnl_register_module() calls.
> 
> You talk about the series.  Is there an introductory letter I should
> lookup on netdev that explains things in more detail?
> 
> I have only seen the patch that is sent directly to me.

Some context here.
https://lore.kernel.org/netdev/20241007124459.5727-1-kuniyu@amazon.com/

Before addf9b90de22, rtnl_register_module() didn't actually need
error handling for some callers, but even after the commit, some
modules copy-and-pasted the wrong code.


> 
> >> I don't strongly care either way, but I want to point out that bailing
> >> out due to a memory allocation failure actually makes the module
> >> initialization more brittle.
> >> 
> >> > Let's handle the errors by rtnl_register_many().
> >> 
> >> Can you describe what the benefit is from completely giving up in the
> >> face of a memory allocation failure versus having as much of the module
> >> function as possible?
> >
> > What if the memory pressure happend to be relaxed soon after the module
> > was loaded incompletely ?
> 
> Huh?  The module will load completely.  It will just won't have full
> functionality.  The rtnetlink functionality simply won't work.
> 
> > Silent failure is much worse to me.
> 
> My point is from the point of view of the MPLS functionality it isn't
> a __failure__.

My point is it _is_ failure for those who use MPLS rtnetlink
functionality, it's uAPI.  Not everyone uses the plain MPLS
without rtnetlink.

Also, I don't want to waste resource due to such an issue on QEMU w/ 2GB
RAM where I'm running syzkaller and often see OOM.  syzkaller searches
and loads modules and exercises various uAPIs randomly, and it handles
module-load-failure properly.


> 
> > rtnl_get_link() will return NULL and users will see -EOPNOTSUPP even
> > though the module was loaded "successfully".
> 
> Yes.  EOPNOTSUPP makes it clear that the rtnetlink functionality
> working.  In most cases modules are autoloaded these days, so the end
> user experience is likely to be EOPNOTSUPP in either case.
> 
> If you log a message, some time later someone will see that there is
> a message in the log that the kernel was very low on memory and could
> could not allocate enough memory for rtnetlink.
> 
> Short of rebooting or retrying to load the module I don't expect
> there is much someone can do in either case.  So this does not look
> to me like a case of silent failure or a broken module.
> 
> Has anyone actually had this happen and reported this as a problem?
> Otherwise we are all just arguing theoretical possibilities.
> 
> My only real point is that change is *not* a *fix*.
> This change is a *cleanup* to make mpls like other modules.
> 
> I am fine with a cleanup, but I really don't think we should describe
> this as something it is not.
> 
> The flip side is I tried very hard to minimize the amount of code in
> af_mpls, to make maintenance simpler, and to reduce the chance of bugs.
> You are busy introducing what appears to me to be an untested error
> handling path which may result in something worse that not logging a
> message.  Especially when someone comes along and makes another change.
> 
> It is all such a corner case and I really don't care, but I just don't
> want this to be seen as a change that is obviously the right way to go
> and that has no downside.

I don't see how this small change has downside that affects maintenability.
Someone who wants to add a new code there can just add a new function call
and goto label, that's simple enough.