[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0BA588DC-1E53-4BCE-B124-77C5C730C267@nutanix.com>
Date: Wed, 10 May 2023 08:09:05 +0000
From: Eiichi Tsukata <eiichi.tsukata@...anix.com>
To: Paul Moore <paul@...l-moore.com>
CC: "eparis@...hat.com" <eparis@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"audit@...r.kernel.org" <audit@...r.kernel.org>
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock
> On May 9, 2023, at 10:34, Eiichi Tsukata <eiichi.tsukata@...anix.com> wrote:
>
>
>
>> On May 8, 2023, at 23:07, Paul Moore <paul@...l-moore.com> wrote:
>>
>> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
>> <eiichi.tsukata@...anix.com> wrote:
>>> Commit 7ffb8e317bae ("audit: we don't need to
>>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
>>> before add_wait_queue_exclusive() which introduced the following race:
>>>
>>> CPU1 CPU2
>>> ======== ========
>>> (in audit_log_start()) (in kauditd_thread())
>>>
>>> queue is full
>>> wake_up(&audit_backlog_wait)
>>> wait_event_freezable()
>>> add_wait_queue_exclusive()
>>> ...
>>> schedule_timeout()
>>>
>>> Once this happens, both audit_log_start() and kauditd_thread() can cause
>>> deadlock for up to backlog_wait_time waiting for each other. To prevent
>>> the race, this patch adds queue full check after
>>> prepare_to_wait_exclusive().
>>
>> Have you seen this occur in practice?
>
> Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> timeout.
>
I found another case.
kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up.
As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once.
If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will
queue skb and issue wake_up_interruptible(&kauditd_wait). No problem.
But if kauditd wakes up a process which is sleeping in audit_receive(), then the process
won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start()
keep sleeping even if kauditd have flushed the queue.
At this point I’m planning to use non-exclusive wait in audit_receive() in v2.
Let me know if we should use wake_up_all() in kauditd or you have better solution.
Eiichi
Powered by blists - more mailing lists