lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <30ca274fb2de087359d47808bda1ca1a6bb63392.1441349871.git.rgb@redhat.com>
Date:	Fri,  4 Sep 2015 05:14:54 -0400
From:	Richard Guy Briggs <rgb@...hat.com>
To:	linux-audit@...hat.com, linux-kernel@...r.kernel.org
Cc:	Richard Guy Briggs <rgb@...hat.com>, sgrubb@...hat.com,
	pmoore@...hat.com, eparis@...hat.com, v.rathor@...il.com,
	ctcard@...mail.com
Subject: [PATCH V1] audit: try harder to send to auditd upon netlink failure

There are several reports of the kernel losing contact with auditd when it is,
in fact, still running.  When this happens, kernel syslogs show:
	"audit: *NO* daemon at audit_pid=<pid>"
although auditd is still running, and is apparently happy, listening on the
netlink socket. The pid in the "*NO* daemon" message matches the pid of the
running auditd process.  Restarting auditd solves this.

The problem appears to happen randomly, and doesn't seem to be strongly
correlated to the rate of audit events being logged.  The problem happens
fairly regularly (every few days), but not yet reproduced to order.

On production kernels, BUG_ON() is a no-op, so any error will trigger this.

Commit 34eab0a7 eliminates one possible cause.  This isn't the case here, since
the PID in the error message and the PID of the running auditd match.

The primary expected cause of error here is -ECONNREFUSED when the audit daemon
goes away, when netlink_getsockbyportid() can't find the auditd portid entry in
the netlink audit table (or there is no receive function).  If -EPERM is
returned, that situation isn't likely to be resolved in a timely fashion
without administrator intervention.  In both cases, reset the audit_pid.  This
does not rule out a race condition.  SELinux is expected to return zero since
this isn't an INET or INET6 socket.  Other LSMs may have other return codes.
Log the error code for better diagnosis in the future.

In the case of -ENOMEM, the situation could be temporary, based on local or
general availability of buffers.  -EAGAIN should never happen since the netlink
audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.  -ERESTARTSYS and -EINTR
are not expected since this kernel thread is not expected to receive signals.
In these cases (or any other unexpected ones for now), report the error and
re-schedule the thread, retrying up to 5 times.

Reported-by: Vipin Rathor <v.rathor@...il.com>
Reported-by: <ctcard@...mail.com>
Signed-off-by: Richard Guy Briggs <rgb@...hat.com>
---
 kernel/audit.c |   43 +++++++++++++++++++++++++++++++++++++++----
 1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 1c13e42..4ee114a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -404,19 +404,54 @@ static void audit_printk_skb(struct sk_buff *skb)
 	audit_hold_skb(skb);
 }
 
+static char *audit_strerror(int err)
+{
+	switch (err) {
+	case -ECONNREFUSED:
+		return "ECONNREFUSED";
+	case -EPERM:
+		return "EPERM";
+	case -ENOMEM:
+		return "ENOMEM";
+	case -EAGAIN:
+		return "EAGAIN";
+	case -ERESTARTSYS:
+		return "ERESTARTSYS";
+	case -EINTR:
+		return "EINTR";
+	default:
+		return "(other)";
+	}
+}
+
 static void kauditd_send_skb(struct sk_buff *skb)
 {
 	int err;
+	int attempts = 0;
+#define AUDITD_RETRIES 5
+
+restart:
 	/* take a reference in case we can't send it and we want to hold it */
 	skb_get(skb);
 	err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
 	if (err < 0) {
 		BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
+		pr_err("netlink_unicast sending to audit_pid=%d returned error: %d, %s\n"
+		       , audit_pid, err, audit_strerror(err));
 		if (audit_pid) {
-			pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
-			audit_log_lost("auditd disappeared");
-			audit_pid = 0;
-			audit_sock = NULL;
+			if (err == -ECONNREFUSED || err == -EPERM
+			    || ++attempts >= AUDITD_RETRIES) {
+				audit_log_lost("audit_pid=%d reset");
+				audit_pid = 0;
+				audit_sock = NULL;
+			} else {
+				pr_warn("re-scheduling(#%d) write to audit_pid=%d\n"
+					, attempts, audit_pid);
+				set_current_state(TASK_INTERRUPTIBLE);
+				schedule();
+				__set_current_state(TASK_RUNNING);
+				goto restart;
+			}
 		}
 		/* we might get lucky and get this in the next auditd */
 		audit_hold_skb(skb);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ