lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <zuzs6ucmgxujim4fb67tw5izp3w2t5k6dzk2ktntqyuwjva73d@tqgwkk6stpgz>
Date: Wed, 22 Oct 2025 08:08:42 +0100
From: Pedro Falcato <pfalcato@...e.de>
To: Kiryl Shutsemau <kirill@...temov.name>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	David Hildenbrand <david@...hat.com>, Matthew Wilcox <willy@...radead.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Alexander Viro <viro@...iv.linux.org.uk>, 
	Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, linux-mm@...ck.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, Kiryl Shutsemau <kas@...nel.org>
Subject: Re: [PATCH] mm/filemap: Implement fast short reads

On Fri, Oct 17, 2025 at 03:15:36PM +0100, Kiryl Shutsemau wrote:
> From: Kiryl Shutsemau <kas@...nel.org>
> 
> The protocol for page cache lookup is as follows:
> 
>   1. Locate a folio in XArray.
>   2. Obtain a reference on the folio using folio_try_get().
>   3. If successful, verify that the folio still belongs to
>      the mapping and has not been truncated or reclaimed.
>   4. Perform operations on the folio, such as copying data
>      to userspace.
>   5. Release the reference.
> 
> For short reads, the overhead of atomic operations on reference
> manipulation can be significant, particularly when multiple tasks access
> the same folio, leading to cache line bouncing.
> 
> <snip>
>+static inline unsigned long filemap_read_fast_rcu(struct address_space *mapping,
> +						  loff_t pos, char *buffer,
> +						  size_t size)
> +{
> +	XA_STATE(xas, &mapping->i_pages, pos >> PAGE_SHIFT);
> +	struct folio *folio;
> +	loff_t file_size;
> +	unsigned int seq;
> +
> +	lockdep_assert_in_rcu_read_lock();
> +
> +	/* Give up and go to slow path if raced with page_cache_delete() */
> +	if (!raw_seqcount_try_begin(&mapping->i_pages_delete_seqcnt, seq))
> +		return false;
> +
> +	folio = xas_load(&xas);
> +	if (xas_retry(&xas, folio))
> +		return 0;
> +
> +	if (!folio || xa_is_value(folio))
> +		return 0;
> +
> +	if (!folio_test_uptodate(folio))
> +		return 0;
> +
> +	/* No fast-case if readahead is supposed to started */
> +	if (folio_test_readahead(folio))
> +		return 0;
> +	/* .. or mark it accessed */
> +	if (!folio_test_referenced(folio))
> +		return 0;
> +
> +	/* i_size check must be after folio_test_uptodate() */
> +	file_size = i_size_read(mapping->host);
> +	if (unlikely(pos >= file_size))
> +		return 0;
> +	if (size > file_size - pos)
> +		size = file_size - pos;
> +
> +	/* Do the data copy */
> +	size = memcpy_from_file_folio(buffer, folio, pos, size);
> +	if (!size)
> +		return 0;
> +

I think we may still have a problematic (rare, possibly theoretical) race here where:

   T0				  		T1						T3
filemap_read_fast_rcu()    |							|
  folio = xas_load(&xas);  |							|
  /* ... */                |  /* truncate or reclaim frees folio, bumps delete	|
                           |     seq */						|  	folio_alloc() from e.g secretmem
  			   |							|	set_direct_map_invalid_noflush(!!)
memcpy_from_file_folio()   |							|

We may have to use copy_from_kernel_nofault() here? Or is something else stopping this from happening?

-- 
Pedro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ