netdev - Re: [RFC 4/5] netlink: prepare validate extack setting for recursion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1537384771.10305.68.camel@sipsolutions.net>
Date:   Wed, 19 Sep 2018 21:19:31 +0200
From:   Johannes Berg <johannes@...solutions.net>
To:     Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
Cc:     netdev@...r.kernel.org
Subject: Re: [RFC 4/5] netlink: prepare validate extack setting for recursion

On Wed, 2018-09-19 at 15:46 -0300, Marcelo Ricardo Leitner wrote:

> > 	NL_SET_ERR_MSG(extack, "warning: deprecated command");
> > 	err = nla_parse(..., extack);
> > 	if (err)
> > 		return err;
> > 	/* do something */
> > 	return 0;
> > 
> > Here you could consider the message there a warning that's transported
> > out even if we return 0, but if we return with a failure from
> > nla_parse() (or nla_validate instead if you wish), then that failure
> > message "wins".
> 
> Agree. This is the core issue here, IMHO. Once out of the context that
> set the message, we have no way of knowing if we can nor should
> overwrite the message that is already in there.

True.

I'm not really sure I'd go as far as calling it an issue though.

IMHO, we really get into this situation if the code is badly structured
to start with. Taking the above code, if you write it as

	err = nla_parse(...);
	if (err)
		return err;
	/* do something */
	NLA_SET_ERR_MSG(extack, "warning: deprecated command");
	return 0;

instead, then you have no such problems.

Well, perhaps you do, if "/* do something */" might *also* set the
message, but basically you have to statically decide which one wins by
ordering the code correctly.

I'm still not convinced, btw, that we actually have any instance in the
kernel today of the issue we've been discussing - namely that some code
does something like the original quoted code at the top of the email.

> > I suppose we could - technically - make that generic, in that we could
> > have both
> > 
> >   NLA_SET_WARN_MSG(extack, "...");
> >   NLA_SET_ERR_MSG(extack, "...");
> 
> I like this.

I'm not really sure what for though :-)

FWIW, if you do think that there's a need for distinguishing this, then
I'd argue that perhaps the right way to address this would be to extend
this all the way to userspace and have two separate attributes for
errors and warnings in the extended ACK message?

> > and keep track of warning vs. error; however, just like my first version
> > of the NLA_REJECT patch, that would break existing code.
> 
> Hm, I may have missed something but I think the discussion in there
> was for a different context. For an extack msg to be set by
> when validate_nla() call returns on nla_parse(), the previous message
> had to be a "warning" because otherwise the parsing wouldn't be even
> attempted. So in that case, we are safe to simply overwrite it.

Yes, arguably, if you really had an "error" then you'd probably have
never gotten to the parsing. It's possible - although sort of stupid -
to write this code too though:

NL_SET_ERR_MSG(extack, "error: unsupported command");
err = nla_parse(...);
return err ? : -EOPNOTSUPP;

Which one should win in that case?

Again, in my opinion we've got enough flexibility if we let
nla_parse/nla_validate behave as today (mostly, nla_validate doesn't set
a message and I'd like to change that, it does set the error attribute
pointer/offset) and let the calling code sort it out.

In many cases nla_parse() will be called by generic code (e.g.
genetlink) and you'd never get into this situation. Once you're past
that, you  can do whatever you like.

> While for the situation you are describing here, it will set a generic
> error message in case the inner code didn't do it.

Yes, but that's not a change in semantics as far as the caller of
nla_parse/nla_validate is concerned - it'll continue to unconditionally
set a message if an error occurred, only the internal behaviour as to
which message seems more relevant is at issue, and the whole recursion
thing and avoiding an outer one overwriting an inner one is IMHO more
useful because that's the more specific error.

> Using the semantics of NLA_SET_WARN_MSG and ERR, then WARN would
> replace other WARNs but not ERRs, and ERR would replace other WARNs
> too but not other ERRs. All we need to do handle this is a bit in
> extack saying if the message is considered a warning or not, or an
> error/fatal message or not.

I'm still not really sure what the use case for a warning is, so not
sure I can really comment on this.

> Okay but we have split parsing of netlink messages and this could be
> useful in there too:
> In cls_api.c, tc_new_tfilter() calls nmmsg_parse() and do some
> processing, and then handle it to a classifier. cls_flower, for
> example, will then do another parsing. If, for whatever reason, flower
> failed and did not set an extack msg, tc_new_tfilter() could set a
> default error message, but at this moment we can't tell if the msg
> already set was just a warning from the first parsing (or any other
> code before engaging flower) (which we can overwrite) or if it a
> message that flower set (which we should not overwrite).
> 
> Hopefully my description clear.. 8-)
> 
> I think this is the same situation as with the nested parsing you're
> proposing.

Yes, I admit that's the same, just not part of pure policy checking, but
open-coded (in a way).

> Currently it (tc_new_tfilter) doesn't set any default error message,
> so this issue is/was not noticed.

True.

Except that I'd still say that tc_new_tfilter() can very well assume
that nothing has set a message before it ran, it's invoked directly by
the core netlink code after all.

So IMHO we don't have an issue here because there aren't arbitrary
callers of this and it can't know what the state is; it does in fact
know very well what the state is when it's called.

With nla_validate/nla_parse that have far more callers we can't be as
certain, although again - I doubt we actually have places today that
would run into this situation (given how all this stuff is still fairly
new).

> > From an external API POV though, nla_validate/nla_parse will continue to
> > unconditionally overwrite any existing "warning" messages with errors,
> > if such occurred. They just won't overwrite their own messages when
> > returning from a nested policy validation.
> 
> Yes, that's fine. I'm not objecting to the behavior. It just feels
> that extack handling is getting tricky by the day and we could seize
> the moment to improve it while we can.

Ok, I guess that's fair. I don't think this actually changes the
"trickiness" of extack handling though; it's purely an internal detail
of nla_validate/nla_parse due to me wanting to add recursion; from an
external POV nothing changed.

johannes