lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHC9VhQaWvgwHGVv==FTdoEZhSqak02ijR7vt4h2pBU5Xs7-2A@mail.gmail.com>
Date:   Wed, 17 May 2023 12:15:37 -0400
From:   Paul Moore <paul@...l-moore.com>
To:     Eiichi Tsukata <eiichi.tsukata@...anix.com>
Cc:     "eparis@...hat.com" <eparis@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "audit@...r.kernel.org" <audit@...r.kernel.org>
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock

On Wed, May 10, 2023 at 4:09 AM Eiichi Tsukata
<eiichi.tsukata@...anix.com> wrote:
> > On May 9, 2023, at 10:34, Eiichi Tsukata <eiichi.tsukata@...anix.com> wrote:
> >> On May 8, 2023, at 23:07, Paul Moore <paul@...l-moore.com> wrote:
> >> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> >> <eiichi.tsukata@...anix.com> wrote:
> >>> Commit 7ffb8e317bae ("audit: we don't need to
> >>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >>> before add_wait_queue_exclusive() which introduced the following race:
> >>>
> >>>   CPU1                           CPU2
> >>> ========                       ========
> >>> (in audit_log_start())         (in kauditd_thread())
> >>>
> >>> queue is full
> >>>                                wake_up(&audit_backlog_wait)
> >>>                                wait_event_freezable()
> >>> add_wait_queue_exclusive()
> >>> ...
> >>> schedule_timeout()
> >>>
> >>> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >>> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >>> the race, this patch adds queue full check after
> >>> prepare_to_wait_exclusive().
> >>
> >> Have you seen this occur in practice?
> >
> > Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> > In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> > timeout.
> >
>
> I found another case.
>
> kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up.
> As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once.
>
> If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will
> queue skb and issue wake_up_interruptible(&kauditd_wait). No problem.
> But if kauditd wakes up a process which is sleeping in audit_receive(), then the process
> won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start()
> keep sleeping even if kauditd have flushed the queue.
>
> At this point I’m planning to use non-exclusive wait in audit_receive() in v2.
> Let me know if we should use wake_up_all() in kauditd or you have better solution.

The nice part about marking the waiters as WQ_FLAG_EXCLUSIVE is that
we avoid the thundering herd problem.  The waiter was held in the
first place because the system was under high levels of audit
pressure, so it stands to reason that a slow wake-up of the waiters is
a wise choice to avoid re-entering another audit pressure state.

I'll take a look at your v2 patch, but as things stand I still believe
that this is better addressed via runtime configuration (see my
previous email).  However, some of the refactoring you've done might
be nice, I'll reply back on that in your v2 patch.

Thanks.

-- 
paul-moore.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ