linux-kernel - Re: [PATCH] staging: most: dim2: fix a race condition in complete_all

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYYtFOW1E2Uv1Wiw@JMW-Ubuntu>
Date: Sat, 7 Feb 2026 03:04:04 +0900
From: Minu Jin <s9430939@...er.com>
To: Dan Carpenter <dan.carpenter@...aro.org>
Cc: parthiban.veerasooran@...rochip.com, christian.gromm@...rochip.com,
	gregkh@...uxfoundation.org, linux-staging@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] staging: most: dim2: fix a race condition in
 complete_all_mbos()

On Fri, Feb 06, 2026 at 10:20:51AM +0300, Dan Carpenter wrote:
> On Fri, Feb 06, 2026 at 01:02:31AM +0900, Minu Jin wrote:
> > The current implementation of complete_all_mbos() repeatedly acquires
> > and releases the spinlock in loop. This causes lock contention.
> > 
> > This patch refactors the function to use list_replace_init(), moving all
> > entries to a local list. This removes the loop-based locking approach
> > and significantly reduces lock contention.
> > 
> > Signed-off-by: Minu Jin <s9430939@...er.com>
> 
> This complete_all_mbos() function is called when we do a
> most_stop_channel() and we ->poison_channel().
> 
> The list heads are &hdm_ch->started_list and &hdm_ch->pending_list.  I
> feel like if we add something to the list while we are also freeing
> items from the list then we are toasted.  In service_done_flag(), we
> delete items from the list but deleting items is fine in this context.
> 
> We add things to the ->pending_list in enqueue() and
> service_done_flag().  We move things from the ->pending_list to the
> ->started_list in try_start_dim_transfer().  So if any of those three
> functions can be run at the same time as complete_all_mbos() we are in
> trouble.
> 
> The hdm_enqueue_thread() function calls enqueue() until
> kthread_should_stop(). The most_stop_channel() function calls
> kthread_stop(c->hdm_enqueue_task) before doing the ->poison_channel()
> so that's fine.
> 
> The service_done_flag() and try_start_dim_transfer() functions are
> called from dim2_task_irq().  When do we stop taking interrupts?  To be
> honest, I don't know.  I thought we had to call disable_irq()?
> 
> So that's the question, when do we disable IRQs in this driver?  I
> would have assumed it was in most_stop_channel() but I can't see it,
> but I'm also not very familiar with this code.
> 
> Let's answer this question and then either add a Fixes tag or say that
> there doesn't appear to be a race condition.
> 
> regards,
> dan carpenter
> 
> 

Hi Dan,

Thank you for spending your time for detailed review and analysis.

To be honest,
my original intention was to reduce lock contention by optimizing
the repeated lock/unlock in the loop from O(n) to O(1).
I wanted to minimize the overhead of acquiring the spinlock multiple times.

However, after reviewing your feedback, I traced the code again 
that you pointed out. I confirmed that IRQs are not disabled during 
the call path. `most_stop_channel() -> poison_channel() -> complete_all_mbos()`

In the original code,
the brief time where the lock is released inside the loop create a time
where an interrupt (eg, dim2_task_irq()) could trigger and modify the list, 
leading to a race condition.

Although it wasn't my original intent, 
I think this patch could also solve this race condition.
By moving the list items to a local list under a single lock, 
it provides the necessary isolation from interrupts.

Does this reasoning make sense to you, or is there something I am missing?
I would appreciate your opinion before 
I update the commit message and send a v2.

Minu Jin