linux-kernel - Re: data-race in audit_log_start / audit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHC9VhTXNPWBDRoPcz-Jw=f+NNAEhxbh-ySc56CUd-ZbuboW5w@mail.gmail.com>
Date:   Thu, 18 Aug 2022 21:59:02 -0400
From:   Paul Moore <paul@...l-moore.com>
To:     abhishek.shah@...umbia.edu
Cc:     eparis@...hat.com, linux-audit@...hat.com,
        linux-kernel@...r.kernel.org, Gabriel Ryan <gabe@...columbia.edu>
Subject: Re: data-race in audit_log_start / audit_receive

On Thu, Aug 18, 2022 at 6:23 PM Abhishek Shah
<abhishek.shah@...umbia.edu> wrote:
> Hi all,
>
> We found a data race involving the audit_cmd_mutex.owner variable. We think this bug is concerning because audit_ctl_owner_current is used at a location that controls the scheduling of tasks shown here. Please let us know what you think.
>
> Thanks!
>
> -----------------Report----------------------
>
> write to 0xffffffff881d0710 of 8 bytes by task 6541 on cpu 0:
>  audit_ctl_lock kernel/audit.c:237 [inline]

...

> read to 0xffffffff881d0710 of 8 bytes by task 6542 on cpu 1:
>  audit_ctl_owner_current kernel/audit.c:258 [inline]

Yes, technically there is a race condition if/when an auditd instance
is registering itself the exact same time as another task is
attempting to log an audit record via audit_log_start().  The risk
being that a *very* limited number of audit records could be
mis-handled with respect to their queue priority and that is it; no
records would be lost or misplaced.  Correcting this would likely
involve a more complex locking scheme[1] or a rather severe
performance penalty due to an additional lock in the audit_log_start()
code path.  There may be some value in modifying
audit_ctl_owner_current() to use READ_ONCE(), but it isn't clear to me
that this would significantly improve things or have no impact on
performance.

Have you noticed any serious problems on your system due to this?  If
you have a reproducer which shows actual harm on the system could you
please share that?

[1] The obvious choice would be to move to a RCU based scheme, but
even that doesn't totally solve the problem as there would still be a
window where some tasks would have an "old" value.  It might actually
end up extending the race window on large multi-core systems due to
the time needed for all of the critical sections to complete.

-- 
paul-moore.com