lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251003023116.GB29748@system.software.com>
Date: Fri, 3 Oct 2025 11:31:16 +0900
From: Byungchul Park <byungchul@...com>
To: David Hildenbrand <david@...hat.com>
Cc: akpm@...ux-foundation.org, ziy@...dia.com, matthew.brost@...el.com,
	joshua.hahnjy@...il.com, rakie.kim@...com, gourry@...rry.net,
	ying.huang@...ux.alibaba.com, apopple@...dia.com, clameter@....com,
	kravetz@...ibm.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, max.byungchul.park@...il.com,
	kernel_team@...ynix.com, harry.yoo@...cle.com,
	gwan-gyeong.mun@...el.com, yeoreum.yun@....com,
	syzkaller@...glegroups.com, ysk@...lloc.com,
	Matthew Wilcox <willy@...radead.org>
Subject: Re: [RFC] mm/migrate: make sure folio_unlock() before
 folio_wait_writeback()

On Thu, Oct 02, 2025 at 01:38:59PM +0200, David Hildenbrand wrote:
> > To simplify the scenario:
> > 
> 
> Just curious, where is the __folio_start_writeback() to complete the
> picture?

ext4_end_io_end() was running as a wq worker after the io completion.

DEPT report can tell that the following scenario happened with
__folio_start_writeback() called far earlier, at least, before
folio_test_writeback() was seen as true, but unfortunately DEPT doesn't
capture the exact location of __folio_start_writeback().

	Byungchul

> >     context X (wq worker)     context Y (process context)
> > 
> >                               migrate_pages_batch()
> >     ext4_end_io_end()           ...
> >       ...                       migrate_folio_unmap()
> >       ext4_get_inode_loc()        ...
> >         ...                       folio_lock() // hold the folio lock
> >         bdev_getblk()             ...
> >           ...                     folio_wait_writeback() // wait forever
> >           __find_get_block_slow()
> >             ...                           ...
> >             folio_lock() // wait forever
> >             folio_unlock()      migrate_folio_undo_src()
> >                                   ...
> >       ...                         folio_unlock() // never reachable
> >       ext4_finish_bio()
> >       ...
> >       folio_end_writeback() // never reachable
> > 
> 
> But aren't you implying that it should from this point on be disallowed
> to call folio_wait_writeback() with the folio lock held? That sounds ...
> a bit wrong.
> 
> Note that it is currently explicitly allowed: folio_wait_writeback()
> documents "If the folio is not locked, writeback may start again after
> writeback has finished.". So there is no way to prevent writeback from
> immediately starting again.
> 
> In particular, wouldn't we have to fixup other callsites to make this
> consistent and then VM_WARN_ON_ONCE() assert that in folio_wait_writeback()?
> 
> Of course, as we've never seen this deadlock before in practice, I do
> wonder if something else prevents it?
> 
> If it's a real issue, I wonder if a trylock on the writeback path could
> be an option.
> 
> --
> Cheers
> 
> David / dhildenb
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ