lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230922152749.244197-1-chris.riches@nutanix.com>
Date:   Fri, 22 Sep 2023 15:27:49 +0000
From:   Chris Riches <chris.riches@...anix.com>
To:     audit@...r.kernel.org, Paul Moore <paul@...l-moore.com>,
        Eric Paris <eparis@...hat.com>
Cc:     chris.riches@...anix.com, jonathan.davies@...anix.com,
        linux-kernel@...r.kernel.org
Subject: [RFC PATCH] audit: Send netlink ACK before setting connection in auditd_set

When auditd_set sets the auditd_conn pointer, audit messages can
immediately be put on the socket by other kernel threads. If the backlog
is large or the rate is high, this can immediately fill the socket
buffer. If the audit daemon requested an ACK for this operation, a full
socket buffer causes the ACK to get dropped, also setting ENOBUFS on the
socket.

To avoid this race and ensure ACKs get through, fast-track the ACK in
this specific case to ensure it is sent before auditd_conn is set.

Signed-off-by: Chris Riches <chris.riches@...anix.com>

---

This mail is more of an RFC than a patch, though the included patch is a
useful illustation and might even be suitable for inclusion.

We are experiencing strange failures where the audit daemon fails to
start, hitting an ENOBUFS error on its audit_set_pid() call. This can be
reproduced by repeatedly restarting the audit daemon (or just doing a
tight loop of audit_open(), audit_set_pid(pid), audit_set_pid(0),
audit_close()) while the system is under heavy audit load. This also
seems to be dependent on the number of CPUs - we can reproduce this with
2 CPUs but not with 48.

Tracing showed a race between the kernel setting auditd_conn and sending
the ACK, as described in the commit message. The included patch attempts
to fix this by ensuring the ACK is sent before any audit messages can be
put on the socket. This seems to fix the problem for auditd starting,
but strangely I still hit it when running my minimal repro (the tight
loop mentioned above), albeit less frequently. I'm not sure if the patch
or the repro is at fault.

We may also want to look at this from the userspace side in the audit
daemon itself (or more specifically, libaudit). The ACK handling is a
little odd - check_ack() will happily return success without seeing an
ACK if a non-ACK message is top of the socket queue, but will fail if no
message arrives within the timeout. It also of course fails if ENOBUFS
is set on the socket, but this failure only seems to matter when doing
audit_set_pid() - similar failures during main-loop message processing
are logged but otherwise ignored, as far as I can tell.

I'm not sure I quite understand the intentions of the code, but it seems
odd to let ENOBUFS be a fatal error here, given that it likely means the
socket buffer got flooded with audit messages, and thus audit_set_pid()
succeeded. Perhaps we should just ignore ENOBUFS (instead of or as well
as patching the kernel).

Finally, there is another oddity in audit_set_pid() that is tangential
to this discussion but worth highlighting: if the wmode parameter is
WAIT_YES (which it is for the audit daemon), then there is some
additional ACK-handling which waits for 100 milliseconds and eats the
top message of the socket queue if one arrives, without inspecting it.
This seems completely wrong as the ACK will have already been consumed
by check_ack() if there was one, and so the best this code can do is
nothing, and at worst it will swallow a genuine audit message without
ever recording it. If you agree with my assessment of this code, I'm
happy to submit a separate patch to fix this.

Thanks,
Chris

 kernel/audit.c | 30 ++++++++++++++++++++++++------
 kernel/audit.h |  5 +++++
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 9bc0b0301198..c48c778b6f89 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -488,12 +488,16 @@ static void auditd_conn_free(struct rcu_head *rcu)
  * @pid: auditd PID
  * @portid: auditd netlink portid
  * @net: auditd network namespace pointer
+ * @ack_status: if AUDIT_ACK_ALWAYS, send ACK before setting connection and
+ *              set to AUDIT_ACK_DONE
+ * @skb: socket buffer for sending ACK
  *
  * Description:
  * This function will obtain and drop network namespace references as
  * necessary.  Returns zero on success, negative values on failure.
  */
-static int auditd_set(struct pid *pid, u32 portid, struct net *net)
+static int auditd_set(struct pid *pid, u32 portid, struct net *net,
+                      int *ack_status, struct sk_buff *skb)
 {
 	unsigned long flags;
 	struct auditd_connection *ac_old, *ac_new;
@@ -508,6 +512,13 @@ static int auditd_set(struct pid *pid, u32 portid, struct net *net)
 	ac_new->portid = portid;
 	ac_new->net = get_net(net);
 
+	if (*ack_status == AUDIT_ACK_ALWAYS) {
+		// Send ACK before we set auditd_conn. Otherwise, the socket buffer may
+		// get filled with backlogged audit messages causing the ACK to be dropped.
+		netlink_ack(skb, nlmsg_hdr(skb), 0, NULL);
+		*ack_status = AUDIT_ACK_DONE;
+	}
+
 	spin_lock_irqsave(&auditd_conn_lock, flags);
 	ac_old = rcu_dereference_protected(auditd_conn,
 					   lockdep_is_held(&auditd_conn_lock));
@@ -1201,7 +1212,7 @@ static int audit_replace(struct pid *pid)
 	return auditd_send_unicast_skb(skb);
 }
 
-static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
+static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *ack_status)
 {
 	u32			seq;
 	void			*data;
@@ -1294,7 +1305,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 				/* register a new auditd connection */
 				err = auditd_set(req_pid,
 						 NETLINK_CB(skb).portid,
-						 sock_net(NETLINK_CB(skb).sk));
+						 sock_net(NETLINK_CB(skb).sk),
+						 ack_status,
+						 skb);
 				if (audit_enabled != AUDIT_OFF)
 					audit_log_config_change("audit_pid",
 								new_pid,
@@ -1548,15 +1561,20 @@ static void audit_receive(struct sk_buff  *skb)
 	 */
 	int len;
 	int err;
+	int ack_status = AUDIT_ACK_ON_ERR;
 
 	nlh = nlmsg_hdr(skb);
 	len = skb->len;
 
 	audit_ctl_lock();
 	while (nlmsg_ok(nlh, len)) {
-		err = audit_receive_msg(skb, nlh);
-		/* if err or if this message says it wants a response */
-		if (err || (nlh->nlmsg_flags & NLM_F_ACK))
+		if (nlh->nlmsg_flags & NLM_F_ACK)
+			ack_status = AUDIT_ACK_ALWAYS;
+
+		err = audit_receive_msg(skb, nlh, &ack_status);
+
+		/* send ACK if err or the message asked for one (and not already sent) */
+		if (ack_status == AUDIT_ACK_ALWAYS || (ack_status == AUDIT_ACK_ON_ERR && err))
 			netlink_ack(skb, nlh, err, NULL);
 
 		nlh = nlmsg_next(nlh, &len);
diff --git a/kernel/audit.h b/kernel/audit.h
index 94738bce40b2..bc692f60567a 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -344,4 +344,9 @@ extern int audit_filter(int msgtype, unsigned int listtype);
 extern void audit_ctl_lock(void);
 extern void audit_ctl_unlock(void);
 
+/* Control netlink ACK behaviour in audit_receive. */
+#define AUDIT_ACK_ON_ERR 1
+#define AUDIT_ACK_ALWAYS 2
+#define AUDIT_ACK_DONE 3
+
 #endif
-- 
2.36.6

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ