linux-kernel - Re: data-race in audit_log_start / audit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHC9VhQnYY_wdUjBbL3NpPYTQcr0M_Nf1g8ObbHiUoUbHPYdYA@mail.gmail.com>
Date:   Fri, 19 Aug 2022 08:06:45 -0400
From:   Paul Moore <paul@...l-moore.com>
To:     abhishek.shah@...umbia.edu
Cc:     eparis@...hat.com, linux-audit@...hat.com,
        linux-kernel@...r.kernel.org, Gabriel Ryan <gabe@...columbia.edu>
Subject: Re: data-race in audit_log_start / audit_receive

On Thu, Aug 18, 2022 at 9:59 PM Paul Moore <paul@...l-moore.com> wrote:
> On Thu, Aug 18, 2022 at 6:23 PM Abhishek Shah
> <abhishek.shah@...umbia.edu> wrote:
> > Hi all,
> >
> > We found a data race involving the audit_cmd_mutex.owner variable. We think this bug is concerning because audit_ctl_owner_current is used at a location that controls the scheduling of tasks shown here. Please let us know what you think.
> >
> > Thanks!
> >
> > -----------------Report----------------------
> >
> > write to 0xffffffff881d0710 of 8 bytes by task 6541 on cpu 0:
> >  audit_ctl_lock kernel/audit.c:237 [inline]
>
> ...
>
> > read to 0xffffffff881d0710 of 8 bytes by task 6542 on cpu 1:
> >  audit_ctl_owner_current kernel/audit.c:258 [inline]
>
> Yes, technically there is a race condition if/when an auditd instance
> is registering itself the exact same time as another task is
> attempting to log an audit record via audit_log_start().

I realized after I sent this and turned off my computer last night
that I typed the wrong thing - the race isn't between auditd and
audit_log_start(), it's between the code which changes the audit
subsystem state (see audit_receive() and the audit watch/tree code)
and audit_log_start().

> The risk
> being that a *very* limited number of audit records could be
> mis-handled with respect to their queue priority and that is it; no
> records would be lost or misplaced.  Correcting this would likely
> involve a more complex locking scheme[1] or a rather severe
> performance penalty due to an additional lock in the audit_log_start()
> code path.  There may be some value in modifying
> audit_ctl_owner_current() to use READ_ONCE(), but it isn't clear to me
> that this would significantly improve things or have no impact on
> performance.

Another thing I thought of last night - I don't believe READ_ONCE()
adds a memory barrier, which would probably be needed; although my
original statement still stands, I'm not sure the performance hit
would justify the marginal impact on the audit queue.

> Have you noticed any serious problems on your system due to this?  If
> you have a reproducer which shows actual harm on the system could you
> please share that?
>
> [1] The obvious choice would be to move to a RCU based scheme, but
> even that doesn't totally solve the problem as there would still be a
> window where some tasks would have an "old" value.  It might actually
> end up extending the race window on large multi-core systems due to
> the time needed for all of the critical sections to complete.

-- 
paul-moore.com