lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Aug 2022 21:01:43 +0000
From:   "Keller, Jacob E" <jacob.e.keller@...el.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "corbet@....net" <corbet@....net>,
        "johannes@...solutions.net" <johannes@...solutions.net>,
        "stephen@...workplumber.org" <stephen@...workplumber.org>,
        "sdf@...gle.com" <sdf@...gle.com>,
        "ecree.xilinx@...il.com" <ecree.xilinx@...il.com>,
        "benjamin.poirier@...il.com" <benjamin.poirier@...il.com>,
        "idosch@...sch.org" <idosch@...sch.org>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "jiri@...nulli.us" <jiri@...nulli.us>,
        "dsahern@...nel.org" <dsahern@...nel.org>,
        "fw@...len.de" <fw@...len.de>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "jhs@...atatu.com" <jhs@...atatu.com>,
        "tgraf@...g.ch" <tgraf@...g.ch>,
        "svinota.saveliev@...il.com" <svinota.saveliev@...il.com>
Subject: RE: [PATCH net-next 2/2] docs: netlink: basic introduction to Netlink



> -----Original Message-----
> From: Jakub Kicinski <kuba@...nel.org>
> Sent: Wednesday, August 17, 2022 7:35 PM
> To: davem@...emloft.net
> Cc: netdev@...r.kernel.org; Jakub Kicinski <kuba@...nel.org>; corbet@....net;
> johannes@...solutions.net; stephen@...workplumber.org; sdf@...gle.com;
> ecree.xilinx@...il.com; benjamin.poirier@...il.com; idosch@...sch.org;
> f.fainelli@...il.com; jiri@...nulli.us; dsahern@...nel.org; fw@...len.de; linux-
> doc@...r.kernel.org; jhs@...atatu.com; tgraf@...g.ch; Keller, Jacob E
> <jacob.e.keller@...el.com>; svinota.saveliev@...il.com
> Subject: [PATCH net-next 2/2] docs: netlink: basic introduction to Netlink
> 
> Provide a bit of a brain dump of netlink related information
> as documentation. Hopefully this will be useful to people
> trying to navigate implementing YAML based parsing in languages
> we won't be able to help with.
> 
> I started writing this doc while trying to figure out what
> it'd take to widen the applicability of YAML to good old rtnl,
> but the doc grew beyond that as it usually happens.
> 
> In all honesty a lot of this information is new to me as I usually
> follow the "copy an existing example, drink to forget" process
> of writing netlink user space, so reviews will be much appreciated.
> 
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> --
> Jon, I'm putting this in userspace-api/ I think it fits reasonably
> well there but please don't hesitate to suggest a better home.
> 

This is really great, thanks!

I read through it and didn't see any glaring flaws. I really appreciate calling out the differences between  generic netlink and classic netlink interfaces

Reviewed-by: Jacob Keller <jacob.e.keller@...el.com>

-Jake

> CC: corbet@....net
> CC: johannes@...solutions.net
> CC: stephen@...workplumber.org
> CC: sdf@...gle.com
> CC: ecree.xilinx@...il.com
> CC: benjamin.poirier@...il.com
> CC: idosch@...sch.org
> CC: f.fainelli@...il.com
> CC: jiri@...nulli.us
> CC: dsahern@...nel.org
> CC: fw@...len.de
> CC: linux-doc@...r.kernel.org
> CC: jhs@...atatu.com
> CC: tgraf@...g.ch
> CC: jacob.e.keller@...el.com
> CC: svinota.saveliev@...il.com
> ---
>  Documentation/userspace-api/netlink/index.rst |  12 +
>  Documentation/userspace-api/netlink/intro.rst | 538 ++++++++++++++++++
>  2 files changed, 550 insertions(+)
>  create mode 100644 Documentation/userspace-api/netlink/index.rst
>  create mode 100644 Documentation/userspace-api/netlink/intro.rst
> 
> diff --git a/Documentation/userspace-api/netlink/index.rst
> b/Documentation/userspace-api/netlink/index.rst
> new file mode 100644
> index 000000000000..c2ef21dce6e7
> --- /dev/null
> +++ b/Documentation/userspace-api/netlink/index.rst
> @@ -0,0 +1,12 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +
> +================
> +Netlink Handbook
> +================
> +
> +Netlink documentation.
> +
> +.. toctree::
> +   :maxdepth: 2
> +
> +   intro
> diff --git a/Documentation/userspace-api/netlink/intro.rst
> b/Documentation/userspace-api/netlink/intro.rst
> new file mode 100644
> index 000000000000..1e6154e7bea6
> --- /dev/null
> +++ b/Documentation/userspace-api/netlink/intro.rst
> @@ -0,0 +1,538 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +
> +=======================
> +Introduction to Netlink
> +=======================
> +
> +Netlink is often described as an ioctl() replacement.
> +It aims to replace fixed-format C structures as supplied
> +to ioctl() with a format which allows an easy way to add
> +or extended the arguments.
> +
> +To achieve this Netlink uses a minimal fixed-format metadata header
> +followed by multiple attributes in the TLV (type, length, value) format.
> +
> +Unfortunately the protocol has evolved over the years, in an organic
> +and undocumented fashion, making it hard to coherently explain.
> +To make the most practical sense this document starts by describing
> +netlink as it is used today and dives into more "historical" uses
> +in later sections.
> +
> +Opening a socket
> +================
> +
> +Netlink communication happens over sockets, a socket needs to be
> +opened first:
> +
> +.. code-block:: c
> +
> +  fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
> +
> +The use of sockets allows for a natural way of exchanging information
> +in both directions (to and from the kernel). The operations are still
> +performed synchronously when applications send() the request but
> +a separate recv() system call is needed to read the reply.
> +
> +A very simplified flow of a Netlink "call" will therefore look
> +something like:
> +
> +.. code-block:: c
> +
> +  fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
> +
> +  /* format the request */
> +  send(fd, &request, sizeof(request));
> +  n = recv(fd, &response, RSP_BUFFER_SIZE);
> +  /* interpret the response */
> +
> +Netlink also provides natural support for "dumping", i.e. communicating
> +to user space all objects of a certain type (e.g. dumping all network
> +interfaces).
> +
> +.. code-block:: c
> +
> +  fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
> +
> +  /* format the dump request */
> +  send(fd, &request, sizeof(request));
> +  while (1) {
> +    n = recv(fd, &buffer, RSP_BUFFER_SIZE);
> +    /* one recv() call can read multiple messages, hence the loop below */
> +    for (nl_msg in buffer) {
> +      if (nl_msg.nlmsg_type == NLMSG_DONE)
> +        goto dump_finished;
> +      /* process the object */
> +    }
> +  }
> +  dump_finished:
> +
> +The first two arguments of the socket() call require little explanation -
> +it is opening a Netlink socket, with all headers provided by the user
> +(hence NETLINK, RAW). The last argument is the protocol within Netlink.
> +This field used to identify the subsystem with which the socket will
> +communicate.
> +
> +Classic vs Generic Netlink
> +--------------------------
> +
> +Initial implementation of Netlink depended on a static allocation
> +of IDs to subsystems and provided little supporting infrastructure.
> +Let us refer to those protocols collectively as **Classic Netlink**.
> +The list of them is defined on top of the ``include/uapi/linux/netlink.h``
> +file, they include among others - general networking (NETLINK_ROUTE),
> +iSCSI (NETLINK_ISCSI), and audit (NETLINK_AUDIT).
> +
> +**Generic Netlink** (introduced in 2005) allows for dynamic registration of
> +subsystems (and subsystem ID allocation), introspection and simplifies
> +implementing the kernel side of the interface.
> +
> +The following section describes how to use Generic Netlink, as the
> +number of subsystems using Generic Netlink outnumbers the older
> +protocols by an order of magnitude. There are also no plans for adding
> +more Classic Netlink protocols to the kernel.
> +Basic information on how communicating with core networking parts of
> +the Linux kernel (or another of the 20 subsystems using Classic
> +Netlink) differs from Generic Netlink is provided later in this document.
> +
> +Generic Netlink
> +===============
> +
> +In addition to the Netlink fixed metadata header each Netlink protocol
> +defines its own fixed metadata header. (Similarly to how network
> +headers stack - Ethernet > IP > TCP we have Netlink > Generic N. > Family.)
> +
> +A Netlink message always starts with struct nlmsghdr, which is followed
> +by a protocol-specific header. In case of Generic Netlink the protocol
> +header is struct genlmsghdr.
> +
> +The practical meaning of the fields in case of Generic Netlink is as follows:
> +
> +.. code-block:: c
> +
> +  struct nlmsghdr {
> +	__u32	nlmsg_len;	/* Length of message including headers */
> +	__u16	nlmsg_type;	/* Generic Netlink Family (subsystem) ID */
> +	__u16	nlmsg_flags;	/* Flags - request or dump */
> +	__u32	nlmsg_seq;	/* Sequence number */
> +	__u32	nlmsg_pid;	/* Endpoint ID, set to 0 */
> +  };
> +  struct genlmsghdr {
> +	__u8	cmd;		/* Command, as defined by the Family */
> +	__u8	version;	/* Irrelevant, set to 1 */
> +	__u16	reserved;	/* Reserved, set to 0 */
> +  };
> +  /* TLV attributes follow... */
> +
> +In Classic Netlink :c:member:`nlmsghdr.nlmsg_type` used to identify
> +which operation within the subsystem the message was referring to
> +(e.g. get information about a netdev). Generic Netlink needs to mux
> +multiple subsystems in a single protocol so it uses this field to
> +identify the subsystem, and :c:member:`genlmsghdr.cmd` identifies
> +the operation instead. (See :ref:`res_fam` for
> +information on how to find the Family ID of the subsystem of interest.)
> +Note that the first 16 values (0 - 15) of this field are reserved for
> +control messages both in Classic Netlink and Generic Netlink.
> +See :ref:`nl_msg_type` for more details.
> +
> +There are 3 usual types of message exchanges on a Netlink socket:
> +
> + - performing a single action (``do``);
> + - dumping information (``dump``);
> + - getting asynchronous notifications (``multicast``).
> +
> +Classic Netlink is very flexible and presumably allows other types
> +of exchanges to happen, but in practice those are the three that get
> +used.
> +
> +Asynchronous notifications are sent by the kernel and received by
> +the user sockets which subscribed to them. ``do`` and ``dump`` requests
> +are initiated by the user. :c:member:`nlmsghdr.nlmsg_flags` should
> +be set as follows:
> +
> + - for ``do``: ``NLM_F_REQUEST | NLM_F_ACK``
> + - for ``dump``: ``NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP``
> +
> +:c:member:`nlmsghdr.nlmsg_seq` should be a set to a monotonically
> +increasing value. The value gets echoed back in responses and doesn't
> +matter in practice, but setting it to an increasing value for each
> +message sent is considered good hygiene. The purpose of the field is
> +matching responses to requests. Asynchronous notifications will have
> +:c:member:`nlmsghdr.nlmsg_seq` of ``0``.
> +
> +:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address.
> +This field can be set to ``0`` when talking to the kernel.
> +See :ref:`nlmsg_pid` for the (uncommon) uses of the field.
> +
> +The expected use for :c:member:`genlmsghdr.version` was to allow
> +versioning of the APIs provided by the subsystems. No subsystem to
> +date made significant use of this field, so setting it to ``1`` seems
> +like a safe bet.
> +
> +.. _nl_msg_type:
> +
> +Netlink message types
> +---------------------
> +
> +As previously mentioned :c:member:`nlmsghdr.nlmsg_type` carries
> +protocol specific values but the first 16 identifiers are reserved
> +(first subsystem specific message type should be equal to
> +``NLMSG_MIN_TYPE`` which is ``0x10``).
> +
> +There are only 4 Netlink control messages defined:
> +
> + - ``NLMSG_NOOP`` - ignore the message, not used in practice;
> + - ``NLMSG_ERROR`` - carries the return code of an operation;
> + - ``NLMSG_DONE`` - marks the end of a dump;
> + - ``NLMSG_OVERRUN`` - socket buffer has overflown.
> +
> +``NLMSG_ERROR`` and ``NLMSG_DONE`` are of practical importance.
> +They carry return codes for operations. Note that unless
> +the ``NLM_F_ACK`` flag is set on the request Netlink will not respond
> +with ``NLMSG_ERROR`` if there is no error. To avoid having to special-case
> +this quirk it is recommended to always set ``NLM_F_ACK``.
> +
> +The format of ``NLMSG_ERROR`` is described by struct nlmsgerr::
> +
> +  ----------------------------------------------
> +  | struct nlmsghdr - response header          |
> +  ----------------------------------------------
> +  |    int error                               |
> +  ----------------------------------------------
> +  | struct nlmsghdr - original request header |
> +  ----------------------------------------------
> +  | ** optionally (1) payload of the request   |
> +  ----------------------------------------------
> +  | ** optionally (2) extended ACK             |
> +  ----------------------------------------------
> +
> +There are two instances of struct nlmsghdr here, first of the response
> +and second of the request. ``NLMSG_ERROR`` carries the information about
> +the request which led to the error. This could be useful when trying
> +to match requests to responses or re-parse the request to dump it into
> +logs.
> +
> +The payload of the request is not echoed in messages reporting success
> +(``error == 0``) or if ``NETLINK_CAP_ACK`` setsockopt() was set.
> +The latter is common
> +and perhaps recommended as having to read a copy of every request back
> +from the kernel is rather wasteful. The absence of request payload
> +is indicated by ``NLM_F_CAPPED`` in :c:member:`nlmsghdr.nlmsg_flags`.
> +
> +The second optional element of ``NLMSG_ERROR`` are the extended ACK
> +attributes. See :ref:`ext_ack` for more details. The presence
> +of extended ACK is indicated by ``NLM_F_ACK_TLVS`` in
> +:c:member:`nlmsghdr.nlmsg_flags`.
> +
> +``NLMSG_DONE`` is simpler, the request is never echoed but the extended
> +ACK attributes may be present::
> +
> +  ----------------------------------------------
> +  | struct nlmsghdr - response header          |
> +  ----------------------------------------------
> +  |    int error                               |
> +  ----------------------------------------------
> +  | ** optionally extended ACK                 |
> +  ----------------------------------------------
> +
> +.. _res_fam:
> +
> +Resolving the Family ID
> +-----------------------
> +
> +This sections explains how to find the Family ID of a subsystem.
> +It also serves as an example of Generic Netlink communication.
> +
> +Generic Netlink is itself a subsystem exposed via the Generic Netlink API.
> +To avoid a circular dependency Generic Netlink has a statically allocated
> +Family ID (``GENL_ID_CTRL`` which is equal to ``NLMSG_MIN_TYPE``).
> +The Generic Netlink family implements a command used to find out information
> +about other families (``CTRL_CMD_GETFAMILY``).
> +
> +To get information about the Generic Netlink family named for example
> +``"test1"`` we need to send a message on the previously opened Generic Netlink
> +socket. The message should target the Generic Netlink Family (1), be a
> +``do`` (2) call to ``CTRL_CMD_GETFAMILY`` (3). A ``dump`` version of this
> +call would make the kernel respond with information about *all* the families
> +it knows about. Last but not least the name of the family in question has
> +to be specified (4) as an attribute with the appropriate type::
> +
> +  struct nlmsghdr:
> +    __u32 nlmsg_len:	32
> +    __u16 nlmsg_type:	GENL_ID_CTRL               // (1)
> +    __u16 nlmsg_flags:	NLM_F_REQUEST | NLM_F_ACK  // (2)
> +    __u32 nlmsg_seq:	1
> +    __u32 nlmsg_pid:	0
> +
> +  struct genlmsghdr:
> +    __u8 cmd:		CTRL_CMD_GETFAMILY         // (3)
> +    __u8 version:	2 /* or 1, doesn't matter */
> +    __u16 reserved:	0
> +
> +  struct nlattr:                                   // (4)
> +    __u16 nla_len:	10
> +    __u16 nla_type:	CTRL_ATTR_FAMILY_NAME
> +    char data: 		test1\0
> +
> +  (padding:)
> +    char data:		\0\0
> +
> +The length fields in Netlink (:c:member:`nlmsghdr.nlmsg_len`
> +and :c:member:`nlattr.nla_len`) always *include* the header.
> +Headers in netlink must be aligned to 4 bytes from the start of the message,
> +hence the extra ``\0\0`` at the end of the message. The attribute lengths
> +*exclude* the padding.
> +
> +If the family is found kernel will reply with two messages, the response
> +with all the information about the family::
> +
> +  /* Message #1 - reply */
> +  struct nlmsghdr:
> +    __u32 nlmsg_len:	136
> +    __u16 nlmsg_type:	GENL_ID_CTRL
> +    __u16 nlmsg_flags:	0
> +    __u32 nlmsg_seq:	1    /* echoed from our request */
> +    __u32 nlmsg_pid:	5831 /* The PID of our user space process */
> +
> +  struct genlmsghdr:
> +    __u8 cmd:		CTRL_CMD_GETFAMILY
> +    __u8 version:	2
> +    __u16 reserved:	0
> +
> +  struct nlattr:
> +    __u16 nla_len:	10
> +    __u16 nla_type:	CTRL_ATTR_FAMILY_NAME
> +    char data: 		test1\0
> +
> +  (padding:)
> +    data:		\0\0
> +
> +  struct nlattr:
> +    __u16 nla_len:	6
> +    __u16 nla_type:	CTRL_ATTR_FAMILY_ID
> +    __u16: 		123  /* The Family ID we are after */
> +
> +  (padding:)
> +    char data:		\0\0
> +
> +  struct nlattr:
> +    __u16 nla_len:	9
> +    __u16 nla_type:	CTRL_ATTR_FAMILY_VERSION
> +    __u16: 		1
> +
> +  /* ... etc, more attributes will follow. */
> +
> +And the error code (success) since ``NLM_F_ACK`` had been set on the request::
> +
> +  /* Message #2 - the ACK */
> +  struct nlmsghdr:
> +    __u32 nlmsg_len:	36
> +    __u16 nlmsg_type:	NLMSG_ERROR
> +    __u16 nlmsg_flags:	NLM_F_CAPPED /* There won't be a payload */
> +    __u32 nlmsg_seq:	1    /* echoed from our request */
> +    __u32 nlmsg_pid:	5831 /* The PID of our user space process */
> +
> +  int error:		0
> +
> +  struct nlmsghdr: /* Copy of the request header as we sent it */
> +    __u32 nlmsg_len:	32
> +    __u16 nlmsg_type:	GENL_ID_CTRL
> +    __u16 nlmsg_flags:	NLM_F_REQUEST | NLM_F_ACK
> +    __u32 nlmsg_seq:	1
> +    __u32 nlmsg_pid:	0
> +
> +The order of attributes (struct nlattr) is not guaranteed so the user
> +has to walk the attributes and parse them.
> +
> +Note that Generic Netlink sockets are not associated or bound to a single
> +family. A socket can be used to exchange messages with many different
> +families, selecting the recipient family on message-by-message basis using
> +the :c:member:`nlmsghdr.nlmsg_type` field.
> +
> +.. _ext_ack:
> +
> +Extended ACK
> +------------
> +
> +Extended ACK controls reporting of additional error/warning TLVs
> +in ``NLMSG_ERROR`` and ``NLMSG_DONE`` messages. To maintain backward
> +compatibility this feature has to be explicitly enabled by setting
> +the ``NETLINK_EXT_ACK`` setsockopt() to ``1``.
> +
> +Types of extended ack attributes are defined in enum nlmsgerr_attrs.
> +The two most commonly used attributes are ``NLMSGERR_ATTR_MSG``
> +and ``NLMSGERR_ATTR_OFFS``.
> +
> +``NLMSGERR_ATTR_MSG`` carries a message in English describing
> +the encountered problem. These messages are far more detailed
> +than what can be expressed thru standard UNIX error codes.
> +
> +``NLMSGERR_ATTR_OFFS`` points to the attribute which caused the problem.
> +
> +Extended ACKs can be reported on errors as well as in case of success.
> +The latter should be treated as a warning.
> +
> +Extended ACKs greatly improve the usability of Netlink and should
> +always be enabled, appropriately parsed and reported to the user.
> +
> +Dump consistency
> +----------------
> +
> +Some of the data structures kernel uses for storing objects make
> +it hard to provide an atomic snapshot of all the objects in a dump
> +(without impacting the fast-paths updating them).
> +
> +Kernel may set the ``NLM_F_DUMP_INTR`` flag on any message in a dump
> +(including the ``NLMSG_DONE`` message) if the dump was interrupted and
> +may be inconsistent (e.g. missing objects). User space should retry
> +the dump if it sees the flag set.
> +
> +Introspection
> +-------------
> +
> +The basic introspection abilities are enabled by access to the Family
> +object as reported in :ref:`res_fam`. User can query information about
> +the Generic Netlink family, including which operations are supported
> +by the kernel and what attributes the kernel understands.
> +Family information includes the highest ID of an attribute kernel can parse,
> +a separate command (``CTRL_CMD_GETPOLICY``) provides detailed information
> +about supported attributes, including ranges of values the kernel accepts.
> +
> +Querying family information is useful in rare cases when user space needs
> +to make sure that the kernel has support for a feature before issuing
> +a request.
> +
> +.. _nlmsg_pid:
> +
> +nlmsg_pid
> +---------
> +
> +:c:member:`nlmsghdr.nlmsg_pid` is called PID because the protocol predates
> +wide spread use of multi-threading and the initial recommendation was
> +to use process ID in this field. Process IDs start from 1 hence the use
> +of ``0`` to mean "allocate automatically".
> +
> +The field is still used today in rare cases when kernel needs to send
> +a unicast notification. User space application can use bind() to associate
> +its socket with a specific PID (similarly to binding to a UDP port),
> +it then communicates its PID to the kernel.
> +The kernel can now reach the user space process.
> +
> +This sort of communication is utilized in UMH (user mode helper)-like
> +scenarios when kernel needs to trigger user space logic or ask user
> +space for a policy decision.
> +
> +Kernel will automatically fill the field with process ID when responding
> +to a request sent with the :c:member:`nlmsghdr.nlmsg_pid` value of ``0``.
> +
> +Classic Netlink
> +===============
> +
> +The main differences between Classic and Generic Netlink are the dynamic
> +allocation of subsystem identifiers and availability of introspection.
> +In theory the protocol does not differ significantly, however, in practice
> +Classic Netlink experimented with concepts which were abandoned in Generic
> +Netlink (really, they usually only found use in a small corner of a single
> +subsystem). This section is meant as an explainer of a few of such concepts,
> +with the explicit goal of giving the Generic Netlink
> +users the confidence to ignore them when reading the uAPI headers.
> +
> +Most of the concepts and examples here refer to the ``NETLINK_ROUTE`` family,
> +which covers much of the configuration of the Linux networking stack.
> +Real documentation of that family, deserves a chapter (or a book) of its own.
> +
> +Families
> +--------
> +
> +Netlink refers to subsystems as families. This is a remnant of using
> +sockets and the concept of protocol families, which are part of message
> +demultiplexing in ``NETLINK_ROUTE``.
> +
> +Sadly every layer of encapsulation likes to refer to whatever it's carrying
> +as "families" making the term very confusing:
> +
> + 1. AF_NETLINK is a bona fide socket protocol family
> + 2. AF_NETLINK's documentation refers to what comes after its own
> +    header (struct nlmsghdr) in a message as a "Family Header"
> + 3. Generic Netlink is a family for AF_NETLINK (struct genlmsghdr follows
> +    struct nlmsghdr), yet it also calls its users "Families".
> +
> +Note that the Generic Netlink Family IDs are in a different "ID space"
> +and overlap with Classic Netlink protocol numbers (e.g. ``NETLINK_CRYPTO``
> +has the Classic Netlink protocol ID of 21 which Generic Netlink will
> +happily allocate to one of its families as well).
> +
> +Strict checking
> +---------------
> +
> +The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking
> +in ``NETLINK_ROUTE``. It was needed because historically kernel did not
> +validate the fields of structures it didn't process. This made it impossible
> +to start using those fields later without risking regressions in applications
> +which initialized them incorrectly or not at all.
> +
> +``NETLINK_GET_STRICT_CHK`` declares that the application is initializing
> +all fields correctly. It also opts into validating that message does not
> +contain trailing data and requests that kernel rejects attributes with
> +type higher than largest attribute type known to the kernel.
> +
> +``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``.
> +
> +Unknown attributes
> +------------------
> +
> +Historically Netlink ignored all unknown attributes. The thinking was that
> +it would free the application from having to probe what kernel supports.
> +The application could make a request to change the state and check which
> +parts of the request "stuck".
> +
> +This is no longer the case for new Generic Netlink families and those opting
> +in to strict checking. See enum netlink_validation for validation types
> +performed.
> +
> +Fixed metadata and structures
> +-----------------------------
> +
> +Classic Netlink made liberal use of fixed-format structures within
> +the messages. Messages would commonly have a structure with
> +a considerable number of fields after struct nlmsghdr. It was also
> +common to put structures with multiple members inside attributes,
> +without breaking each member into an attribute of its own.
> +
> +Request types
> +-------------
> +
> +``NETLINK_ROUTE`` categorized requests into 4 types ``NEW``, ``DEL``, ``GET``,
> +and ``SET``. Each object can handle all or some of those requests
> +(objects being netdevs, routes, addresses, qdiscs etc.) Request type
> +is defined by the 2 lowest bits of the message type, so commands for
> +new objects would always be allocated with a stride of 4.
> +
> +Each object would also have it's own fixed metadata shared by all request
> +types (e.g. struct ifinfomsg for netdev requests, struct ifaddrmsg for address
> +requests, struct tcmsg for qdisc requests).
> +
> +Even though other protocols and Generic Netlink commands often use
> +the same verbs in their message names (``GET``, ``SET``) the concept
> +of request types did not find wider adoption.
> +
> +Message flags
> +-------------
> +
> +The earlier section has already covered the basic request flags
> +(``NLM_F_REQUEST``, ``NLM_F_ACK``, ``NLM_F_DUMP``) and the
> ``NLMSG_ERROR`` /
> +``NLMSG_DONE`` flags (``NLM_F_CAPPED``, ``NLM_F_ACK_TLVS``).
> +Dump flags were also mentioned (``NLM_F_MULTI``, ``NLM_F_DUMP_INTR``).
> +
> +Those are the main flags of note, with a small exception (of ``ieee802154``)
> +Generic Netlink does not make use of other flags. If the protocol needs
> +to communicate special constraints for a request it should use
> +an attribute, not the flags in struct nlmsghdr.
> +
> +Classic Netlink, however, defined various flags for its ``GET``, ``NEW``
> +and ``DEL`` requests. Since request types have not been generalized
> +the request type specific flags should not be used either.
> +
> +uAPI reference
> +==============
> +
> +.. kernel-doc:: include/uapi/linux/netlink.h
> --
> 2.37.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ