linux-kernel - Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9a586b5b-c47f-45eb-83c8-1e86431fc83d@redhat.com>
Date: Thu, 2 Oct 2025 13:38:59 +0200
From: David Hildenbrand <david@...hat.com>
To: Byungchul Park <byungchul@...com>, akpm@...ux-foundation.org
Cc: ziy@...dia.com, matthew.brost@...el.com, joshua.hahnjy@...il.com,
 rakie.kim@...com, gourry@...rry.net, ying.huang@...ux.alibaba.com,
 apopple@...dia.com, clameter@....com, kravetz@...ibm.com,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 max.byungchul.park@...il.com, kernel_team@...ynix.com, harry.yoo@...cle.com,
 gwan-gyeong.mun@...el.com, yeoreum.yun@....com, syzkaller@...glegroups.com,
 ysk@...lloc.com, Matthew Wilcox <willy@...radead.org>
Subject: Re: [RFC] mm/migrate: make sure folio_unlock() before
 folio_wait_writeback()

> To simplify the scenario:
> 

Just curious, where is the __folio_start_writeback() to complete the 
picture?

>     context X (wq worker)	context Y (process context)
> 
> 				migrate_pages_batch()
>     ext4_end_io_end()		  ...
>       ...			  migrate_folio_unmap()
>       ext4_get_inode_loc()	    ...
>         ...			    folio_lock() // hold the folio lock
>         bdev_getblk()		    ...
>           ...			    folio_wait_writeback() // wait forever
>           __find_get_block_slow()
>             ...			    ...
>             folio_lock() // wait forever
>             folio_unlock()	  migrate_folio_undo_src()
> 				    ...
>       ...			    folio_unlock() // never reachable
>       ext4_finish_bio()
> 	...
> 	folio_end_writeback() // never reachable
> 

But aren't you implying that it should from this point on be disallowed 
to call folio_wait_writeback() with the folio lock held? That sounds ... 
a bit wrong.

Note that it is currently explicitly allowed: folio_wait_writeback() 
documents "If the folio is not locked, writeback may start again after 
writeback has finished.". So there is no way to prevent writeback from 
immediately starting again.

In particular, wouldn't we have to fixup other callsites to make this 
consistent and then VM_WARN_ON_ONCE() assert that in folio_wait_writeback()?

Of course, as we've never seen this deadlock before in practice, I do 
wonder if something else prevents it?

If it's a real issue, I wonder if a trylock on the writeback path could 
be an option.

-- 
Cheers

David / dhildenb