[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241007182106.39342-1-kuniyu@amazon.com>
Date: Mon, 7 Oct 2024 11:21:06 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <ebiederm@...ssion.com>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
<kuni1840@...il.com>, <kuniyu@...zon.com>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>
Subject: Re: [PATCH v3 net 5/6] mpls: Handle error of rtnl_register_module().
From: "Eric W. Biederman" <ebiederm@...ssion.com>
Date: Mon, 07 Oct 2024 11:28:11 -0500
> Kuniyuki Iwashima <kuniyu@...zon.com> writes:
>
> > From: "Eric W. Biederman" <ebiederm@...ssion.com>
> > Date: Mon, 07 Oct 2024 09:56:44 -0500
> >> Kuniyuki Iwashima <kuniyu@...zon.com> writes:
> >>
> >> > Since introduced, mpls_init() has been ignoring the returned
> >> > value of rtnl_register_module(), which could fail.
> >>
> >> As I recall that was deliberate. The module continues to work if the
> >> rtnetlink handlers don't operate, just some functionality is lost.
> >
> > It's ok if it wasn't a module. rtnl_register() logs an error message
> > in syslog, but rtnl_register_module() doesn't. That's why this series
> > only changes some rtnl_register_module() calls.
>
> You talk about the series. Is there an introductory letter I should
> lookup on netdev that explains things in more detail?
>
> I have only seen the patch that is sent directly to me.
Some context here.
https://lore.kernel.org/netdev/20241007124459.5727-1-kuniyu@amazon.com/
Before addf9b90de22, rtnl_register_module() didn't actually need
error handling for some callers, but even after the commit, some
modules copy-and-pasted the wrong code.
>
> >> I don't strongly care either way, but I want to point out that bailing
> >> out due to a memory allocation failure actually makes the module
> >> initialization more brittle.
> >>
> >> > Let's handle the errors by rtnl_register_many().
> >>
> >> Can you describe what the benefit is from completely giving up in the
> >> face of a memory allocation failure versus having as much of the module
> >> function as possible?
> >
> > What if the memory pressure happend to be relaxed soon after the module
> > was loaded incompletely ?
>
> Huh? The module will load completely. It will just won't have full
> functionality. The rtnetlink functionality simply won't work.
>
> > Silent failure is much worse to me.
>
> My point is from the point of view of the MPLS functionality it isn't
> a __failure__.
My point is it _is_ failure for those who use MPLS rtnetlink
functionality, it's uAPI. Not everyone uses the plain MPLS
without rtnetlink.
Also, I don't want to waste resource due to such an issue on QEMU w/ 2GB
RAM where I'm running syzkaller and often see OOM. syzkaller searches
and loads modules and exercises various uAPIs randomly, and it handles
module-load-failure properly.
>
> > rtnl_get_link() will return NULL and users will see -EOPNOTSUPP even
> > though the module was loaded "successfully".
>
> Yes. EOPNOTSUPP makes it clear that the rtnetlink functionality
> working. In most cases modules are autoloaded these days, so the end
> user experience is likely to be EOPNOTSUPP in either case.
>
> If you log a message, some time later someone will see that there is
> a message in the log that the kernel was very low on memory and could
> could not allocate enough memory for rtnetlink.
>
> Short of rebooting or retrying to load the module I don't expect
> there is much someone can do in either case. So this does not look
> to me like a case of silent failure or a broken module.
>
> Has anyone actually had this happen and reported this as a problem?
> Otherwise we are all just arguing theoretical possibilities.
>
> My only real point is that change is *not* a *fix*.
> This change is a *cleanup* to make mpls like other modules.
>
> I am fine with a cleanup, but I really don't think we should describe
> this as something it is not.
>
> The flip side is I tried very hard to minimize the amount of code in
> af_mpls, to make maintenance simpler, and to reduce the chance of bugs.
> You are busy introducing what appears to me to be an untested error
> handling path which may result in something worse that not logging a
> message. Especially when someone comes along and makes another change.
>
> It is all such a corner case and I really don't care, but I just don't
> want this to be seen as a change that is obviously the right way to go
> and that has no downside.
I don't see how this small change has downside that affects maintenability.
Someone who wants to add a new code there can just add a new function call
and goto label, that's simple enough.
Powered by blists - more mailing lists