linux-kernel - Re: [PATCH 0/4] audit: refactor and fix for potential deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHC9VhQg6aF_MKgkfY5RxgBgfVpUrU=xOTHenob++nmW5=aWug@mail.gmail.com>
Date:   Wed, 17 May 2023 12:03:41 -0400
From:   Paul Moore <paul@...l-moore.com>
To:     Eiichi Tsukata <eiichi.tsukata@...anix.com>
Cc:     "eparis@...hat.com" <eparis@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "audit@...r.kernel.org" <audit@...r.kernel.org>
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock

On Mon, May 8, 2023 at 9:49 PM Eiichi Tsukata
<eiichi.tsukata@...anix.com> wrote:
> > On May 8, 2023, at 23:07, Paul Moore <paul@...l-moore.com> wrote:
> > On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> > <eiichi.tsukata@...anix.com> wrote:
> >> Commit 7ffb8e317bae ("audit: we don't need to
> >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >> before add_wait_queue_exclusive() which introduced the following race:
> >>
> >>    CPU1                           CPU2
> >>  ========                       ========
> >>  (in audit_log_start())         (in kauditd_thread())
> >>
> >>  queue is full
> >>                                 wake_up(&audit_backlog_wait)
> >>                                 wait_event_freezable()
> >>  add_wait_queue_exclusive()
> >>  ...
> >>  schedule_timeout()
> >>
> >> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >> the race, this patch adds queue full check after
> >> prepare_to_wait_exclusive().
> >
> > Have you seen this occur in practice?
>
> Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> timeout.

Sorry for the delay, I was away at a conference last week.

Regardless of how we tweak the wait, there is always going to be a
chance that rescheduling the task with a timeout is going to impact
the application.  If you have sensitive applications where this is
proving to be a problem I would suggest tuning your backlog size, the
wait time, or removing the backlog limit entirely.

--
paul-moore.com