linux-kernel - Re: [PATCH 10/10] powerpc: remove address space overrides using set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200903071144.GA19247@lst.de>
Date:   Thu, 3 Sep 2020 09:11:44 +0200
From:   Christoph Hellwig <hch@....de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Christophe Leroy <christophe.leroy@...roup.eu>,
        Christoph Hellwig <hch@....de>,
        Al Viro <viro@...iv.linux.org.uk>,
        Michael Ellerman <mpe@...erman.id.au>,
        the arch/x86 maintainers <x86@...nel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        Kees Cook <keescook@...omium.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 10/10] powerpc: remove address space overrides using
 set_fs()

On Wed, Sep 02, 2020 at 11:02:22AM -0700, Linus Torvalds wrote:
> I don't see why this change would make any difference.

Me neither, but while looking at a different project I did spot places
that actually do an access_ok with len 0, that's why I wanted him to
try.

That being said: Christophe are these number stables?  Do you get
similar numbers with multiple runs?

> And btw, why do the 32-bit and 64-bit checks even differ? It's not
> like the extra (single) instruction should even matter. I think the
> main reason is that the simpler 64-bit case could stay as a macro
> (because it only uses "addr" and "size" once), but honestly, that
> "simplification" doesn't help when you then need to have that #ifdef
> for the 32-bit case and an inline function anyway.

I'll have to leave that to the powerpc folks.  The intent was to not
change the behavior (and I even fucked that up for the the size == 0
case).

> However, I suspect a bigger reason for the actual performance
> degradation would be the patch that makes things use "write_iter()"
> for writing, even when a simpler "write()" exists.

Except that we do not actually have such a patch.  For normal user
writes we only use ->write_iter if ->write is not present.  But what
shows up in the profile is that /dev/zero only has a read_iter op and
not a normal read.  I've added a patch below that implements a normal
read which might help a tad with this workload, but should not be part
of a regression.

Also Christophe:  can you bisect which patch starts this?  Is it really
this last patch in the series?

---
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cdebc..1dc99ab158457a 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -726,6 +726,27 @@ static ssize_t read_iter_zero(struct kiocb *iocb, struct iov_iter *iter)
 	return written;
 }
 
+static ssize_t read_zero(struct file *file, char __user *buf,
+			 size_t count, loff_t *ppos)
+{
+	size_t cleared = 0;
+
+	while (count) {
+		size_t chunk = min_t(size_t, count, PAGE_SIZE);
+
+		if (clear_user(buf + cleared, chunk))
+			return cleared ? cleared : -EFAULT;
+		cleared += chunk;
+		count -= chunk;
+
+		if (signal_pending(current))
+			return cleared ? cleared : -ERESTARTSYS;
+		cond_resched();
+	}
+
+	return cleared;
+}
+
 static int mmap_zero(struct file *file, struct vm_area_struct *vma)
 {
 #ifndef CONFIG_MMU
@@ -921,6 +942,7 @@ static const struct file_operations zero_fops = {
 	.llseek		= zero_lseek,
 	.write		= write_zero,
 	.read_iter	= read_iter_zero,
+	.read		= read_zero,
 	.write_iter	= write_iter_zero,
 	.mmap		= mmap_zero,
 	.get_unmapped_area = get_unmapped_area_zero,