lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 May 2024 09:12:42 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Byungchul Park <byungchul@...com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Cc: kernel_team@...ynix.com, akpm@...ux-foundation.org, ying.huang@...el.com,
 vernhao@...cent.com, mgorman@...hsingularity.net, hughd@...gle.com,
 willy@...radead.org, david@...hat.com, peterz@...radead.org,
 luto@...nel.org, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, rjgolo@...il.com
Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering
 tlb flush when folios get unmapped

On 5/31/24 02:19, Byungchul Park wrote:
..
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0283cf366c2a..03683bf66031 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2872,6 +2872,12 @@ static inline void file_end_write(struct file *file)
>  	if (!S_ISREG(file_inode(file)->i_mode))
>  		return;
>  	sb_end_write(file_inode(file)->i_sb);
> +
> +	/*
> +	 * XXX: If needed, can be optimized by avoiding luf_flush() if
> +	 * the address space of the file has never been involved by luf.
> +	 */
> +	luf_flush();
>  }
..
> +void luf_flush(void)
> +{
> +	unsigned long flags;
> +	unsigned short int ugen;
> +
> +	/*
> +	 * Obtain the latest ugen number.
> +	 */
> +	spin_lock_irqsave(&luf_lock, flags);
> +	ugen = luf_gen;
> +	spin_unlock_irqrestore(&luf_lock, flags);
> +
> +	check_luf_flush(ugen);
> +}

Am I reading this right?  There's now an unconditional global spinlock
acquired in the sys_write() path?  How can this possibly scale?

So, yeah, I think an optimization is absolutely needed.  But, on a more
fundamental level, I just don't believe these patches are being tested.
Even a simple microbenchmark should show a pretty nasty regression on
any decently large system:

> https://github.com/antonblanchard/will-it-scale/blob/master/tests/write1.c

Second, I was just pointing out sys_write() as an example of how the
page cache could change.  Couldn't a separate, read/write mmap() of the
file do the same thing and *not* go through sb_end_write()?

So:

	fd = open("foo");
	ptr1 = mmap(fd, PROT_READ);
	ptr2 = mmap(fd, PROT_READ|PROT_WRITE);

	foo = *ptr1; // populate the page cache
	... page cache page is reclaimed and LUF'd
	*ptr2 = bar; // new page cache page is allocated and written to

	printk("*ptr1: %d\n", *ptr1);

Doesn't the printk() see stale data?

I think tglx would call all of this "tinkering".  The approach to this
series is to "fix" narrow, specific cases that reviewers point out, make
it compile, then send it out again, hoping someone will apply it.

So, for me, until the approach to this series changes: NAK, for x86.
Andrew, please don't take this series.  Or, if you do, please drop the
patch enabling it on x86.

I also have the feeling our VFS friends won't take kindly to having
random luf_foo() hooks in their hot paths, optimized or not.  I don't
see any of them on cc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ