lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 09 Sep 2015 16:41:54 -0400
From:	Paul Moore <pmoore@...hat.com>
To:	Richard Guy Briggs <rgb@...hat.com>
Cc:	linux-audit@...hat.com, linux-kernel@...r.kernel.org,
	sgrubb@...hat.com, eparis@...hat.com, v.rathor@...il.com,
	ctcard@...mail.com
Subject: Re: [PATCH V2] audit: try harder to send to auditd upon netlink failure

On Monday, September 07, 2015 05:10:13 AM Richard Guy Briggs wrote:
> There are several reports of the kernel losing contact with auditd when
> it is, in fact, still running.  When this happens, kernel syslogs show:
> 	"audit: *NO* daemon at audit_pid=<pid>"
> although auditd is still running, and is apparently happy, listening on
> the netlink socket. The pid in the "*NO* daemon" message matches the pid
> of the running auditd process.  Restarting auditd solves this.
> 
> The problem appears to happen randomly, and doesn't seem to be strongly
> correlated to the rate of audit events being logged.  The problem
> happens fairly regularly (every few days), but not yet reproduced to
> order.
> 
> On production kernels, BUG_ON() is a no-op, so any error will trigger
> this.
> 
> Commit 34eab0a7cd45 ("audit: prevent an older auditd shutdown from
> orphaning a newer auditd startup") eliminates one possible cause.  This
> isn't the case here, since the PID in the error message and the PID of
> the running auditd match.
> 
> The primary expected cause of error here is -ECONNREFUSED when the audit
> daemon goes away, when netlink_getsockbyportid() can't find the auditd
> portid entry in the netlink audit table (or there is no receive
> function).  If -EPERM is returned, that situation isn't likely to be
> resolved in a timely fashion without administrator intervention.  In
> both cases, reset the audit_pid.  This does not rule out a race
> condition.  SELinux is expected to return zero since this isn't an INET
> or INET6 socket.  Other LSMs may have other return codes.  Log the error
> code for better diagnosis in the future.
> 
> In the case of -ENOMEM, the situation could be temporary, based on local
> or general availability of buffers.  -EAGAIN should never happen since
> the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
> -ERESTARTSYS and -EINTR are not expected since this kernel thread is not
> expected to receive signals.  In these cases (or any other unexpected
> ones for now), report the error and re-schedule the thread, retrying up
> to 5 times.
> 
> v2:
> 	Removed BUG_ON().
> 	Moved comma in pr_*() statements.
> 	Removed audit_strerror() text.
> 
> Reported-by: Vipin Rathor <v.rathor@...il.com>
> Reported-by: <ctcard@...mail.com>
> Signed-off-by: Richard Guy Briggs <rgb@...hat.com>
> ---
>  kernel/audit.c |   24 +++++++++++++++++++-----
>  1 files changed, 19 insertions(+), 5 deletions(-)

Queued up for linux-audit#next as soon as 4.3-rc1 is released.

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 1c13e42..18cdfe2 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -407,16 +407,30 @@ static void audit_printk_skb(struct sk_buff *skb)
>  static void kauditd_send_skb(struct sk_buff *skb)
>  {
>  	int err;
> +	int attempts = 0;
> +#define AUDITD_RETRIES 5
> +
> +restart:
>  	/* take a reference in case we can't send it and we want to hold it */
>  	skb_get(skb);
>  	err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
>  	if (err < 0) {
> -		BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
> +		pr_err("netlink_unicast sending to audit_pid=%d returned error: %d\n",
> +		       audit_pid, err);
>  		if (audit_pid) {
> -			pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
> -			audit_log_lost("auditd disappeared");
> -			audit_pid = 0;
> -			audit_sock = NULL;
> +			if (err == -ECONNREFUSED || err == -EPERM
> +			    || ++attempts >= AUDITD_RETRIES) {
> +				audit_log_lost("audit_pid=%d reset");
> +				audit_pid = 0;
> +				audit_sock = NULL;
> +			} else {
> +				pr_warn("re-scheduling(#%d) write to audit_pid=%d\n",
> +					attempts, audit_pid);
> +				set_current_state(TASK_INTERRUPTIBLE);
> +				schedule();
> +				__set_current_state(TASK_RUNNING);
> +				goto restart;
> +			}
>  		}
>  		/* we might get lucky and get this in the next auditd */
>  		audit_hold_skb(skb);

-- 
paul moore
security @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ