netdev - Re: [PATCH net-next 2/2] docs: netlink: basic introduction to Netlink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6350516756628945f9cc1ee0248e92473521ed0b.camel@sipsolutions.net>
Date:   Thu, 18 Aug 2022 09:36:46 +0200
From:   Johannes Berg <johannes@...solutions.net>
To:     Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc:     netdev@...r.kernel.org, corbet@....net, stephen@...workplumber.org,
        sdf@...gle.com, ecree.xilinx@...il.com, benjamin.poirier@...il.com,
        idosch@...sch.org, f.fainelli@...il.com, jiri@...nulli.us,
        dsahern@...nel.org, fw@...len.de, linux-doc@...r.kernel.org,
        jhs@...atatu.com, tgraf@...g.ch, jacob.e.keller@...el.com,
        svinota.saveliev@...il.com
Subject: Re: [PATCH net-next 2/2] docs: netlink: basic introduction to
 Netlink

On Wed, 2022-08-17 at 19:35 -0700, Jakub Kicinski wrote:

> +To get information about the Generic Netlink family named for example
> +``"test1"`` we need to send a message on the previously opened Generic Netlink
> +socket. The message should target the Generic Netlink Family (1), be a
> +``do`` (2) call to ``CTRL_CMD_GETFAMILY`` (3). A ``dump`` version of this
> +call would make the kernel respond with information about *all* the families
> +it knows about. Last but not least the name of the family in question has
> +to be specified (4) as an attribute with the appropriate type::
> +
> +  struct nlmsghdr:
> +    __u32 nlmsg_len:	32
> +    __u16 nlmsg_type:	GENL_ID_CTRL               // (1)
> +    __u16 nlmsg_flags:	NLM_F_REQUEST | NLM_F_ACK  // (2)
> +    __u32 nlmsg_seq:	1
> +    __u32 nlmsg_pid:	0
> +
> +  struct genlmsghdr:
> +    __u8 cmd:		CTRL_CMD_GETFAMILY         // (3)
> +    __u8 version:	2 /* or 1, doesn't matter */
> +    __u16 reserved:	0
> +
> +  struct nlattr:                                   // (4)
> +    __u16 nla_len:	10
> +    __u16 nla_type:	CTRL_ATTR_FAMILY_NAME
> +    char data: 		test1\0
> +
> +  (padding:)
> +    char data:		\0\0
> +
> +The length fields in Netlink (:c:member:`nlmsghdr.nlmsg_len`
> +and :c:member:`nlattr.nla_len`) always *include* the header.
> +Headers in netlink must be aligned to 4 bytes from the start of the message,

s/Headers/Attribute headers/ perhaps?

> +hence the extra ``\0\0`` at the end of the message.
> 

And I think technically for the _last_ attribute it wouldn't be needed?

> +If the family is found kernel will reply with two messages, the response
> +with all the information about the family::
> +
> +  /* Message #1 - reply */
> +  struct nlmsghdr:
> +    __u32 nlmsg_len:	136
> +    __u16 nlmsg_type:	GENL_ID_CTRL
> +    __u16 nlmsg_flags:	0
> +    __u32 nlmsg_seq:	1    /* echoed from our request */
> +    __u32 nlmsg_pid:	5831 /* The PID of our user space process */

s/PID/netlink port ID/

It's actually whatever you choose, I think? Lots of libraries will
choose (something based on) the process ID, but that's not really
needed?

(autobind is different maybe?)


> +  /* Message #2 - the ACK */
> +  struct nlmsghdr:
> +    __u32 nlmsg_len:	36
> +    __u16 nlmsg_type:	NLMSG_ERROR
> +    __u16 nlmsg_flags:	NLM_F_CAPPED /* There won't be a payload */
> +    __u32 nlmsg_seq:	1    /* echoed from our request */
> +    __u32 nlmsg_pid:	5831 /* The PID of our user space process */

(same here of course)

> +``NLMSGERR_ATTR_MSG`` carries a message in English describing
> +the encountered problem. These messages are far more detailed
> +than what can be expressed thru standard UNIX error codes.

"through"?

> +Querying family information is useful in rare cases when user space needs

debatable if that's "rare", but yeah, today it's not done much :)

> +.. _nlmsg_pid:
> +
> +nlmsg_pid
> +---------
> +
> +:c:member:`nlmsghdr.nlmsg_pid` is called PID because the protocol predates
> +wide spread use of multi-threading and the initial recommendation was
> +to use process ID in this field. Process IDs start from 1 hence the use
> +of ``0`` to mean "allocate automatically".
> +
> +The field is still used today in rare cases when kernel needs to send
> +a unicast notification. User space application can use bind() to associate
> +its socket with a specific PID (similarly to binding to a UDP port),
> +it then communicates its PID to the kernel.
> +The kernel can now reach the user space process.
> +
> +This sort of communication is utilized in UMH (user mode helper)-like
> +scenarios when kernel needs to trigger user space logic or ask user
> +space for a policy decision.
> +
> +Kernel will automatically fill the field with process ID when responding
> +to a request sent with the :c:member:`nlmsghdr.nlmsg_pid` value of ``0``.


I think this could be written a bit better - we call this thing a "port
ID" internally now, and yes, it might default to a process ID (more
specifically task group ID) ... but it feels like this could explain
bind vs. autobind etc. a bit more? And IMHO it should focus less on the
process ID/PID than saying "port ID" with a (historical) default of
using the PID/TGID.

> +Strict checking
> +---------------
> +
> +The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking
> +in ``NETLINK_ROUTE``. It was needed because historically kernel did not
> +validate the fields of structures it didn't process. This made it impossible
> +to start using those fields later without risking regressions in applications
> +which initialized them incorrectly or not at all.
> +
> +``NETLINK_GET_STRICT_CHK`` declares that the application is initializing
> +all fields correctly. It also opts into validating that message does not
> +contain trailing data and requests that kernel rejects attributes with
> +type higher than largest attribute type known to the kernel.
> +
> +``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``.

However, there are also more generally strict checks in policy
validation ... maybe a discussion of all that would be worthwhile?

> +Unknown attributes
> +------------------
> +
> +Historically Netlink ignored all unknown attributes. The thinking was that
> +it would free the application from having to probe what kernel supports.
> +The application could make a request to change the state and check which
> +parts of the request "stuck".
> +
> +This is no longer the case for new Generic Netlink families and those opting
> +in to strict checking. See enum netlink_validation for validation types
> +performed.

OK some of that is this, but some of it is also the strict length checks
e.g. for Ethernet addresses.

> +Fixed metadata and structures
> +-----------------------------
> +
> +Classic Netlink made liberal use of fixed-format structures within
> +the messages. Messages would commonly have a structure with
> +a considerable number of fields after struct nlmsghdr. It was also
> +common to put structures with multiple members inside attributes,
> +without breaking each member into an attribute of its own.

That reads very descriptive and historic without making a recommendation
- I know it's in the section, but maybe do say something like "This is
discouraged now and attributes should be used instead"?


Either way, thanks for doing this, it's a great overview!

We might add:
 - availability of attribute policy introspection
   (you mention family introspection only I think)

 - do we want to bring in the whole "per operation" vs. "per genetlink
   family" attribute policy?
   (I'm firmly on the "single policy for the whole family" side ...)

 - maybe not the appropriate place here, but maybe some best practices
   for handling attributes, such as the multi-attribute array thing we
   discussed in the other thread?

 - maybe more userspace recommendations such as using different sockets
   for multicast listeners and requests, because otherwise it gets
   tricky to wait for the ACK of a request since you have to handle
   notifications that happen meanwhile?

 - maybe some mention of the fact that sometimes we now bind kernel
   object or state lifetime to a socket, e.g. in wireless you can
   connect and if your userspace crashes/closes the socket, the
   connection is automatically torn down (because you can't handle the
   things needed anymore)

 - maybe something about message sizes? we've had lots of trouble with
   that in nl80211, but tbh I'm not really sure what we should say about
   it other than making sure you use large enough buffers ...

johannes