linux-kernel - Re: [vfs] 8bb3c61baf: vm-scalability.median -23.7% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1909101930140.1518@eggly.anvils>
Date:   Tue, 10 Sep 2019 20:05:38 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Al Viro <viro@...iv.linux.org.uk>
cc:     Hugh Dickins <hughd@...gle.com>,
        kernel test robot <rong.a.chen@...el.com>,
        David Howells <dhowells@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org, lkp@...org
Subject: Re: [vfs]  8bb3c61baf:  vm-scalability.median -23.7% regression

On Mon, 9 Sep 2019, Hugh Dickins wrote:
> On Mon, 9 Sep 2019, Al Viro wrote:
> > 
> > Anyway, see vfs.git#uncertain.shmem for what I've got with those folded in.
> > Do you see any problems with that one?  That's the last 5 commits in there...
> 
> It's mostly fine, I've no problem with going your way instead of what
> we had in mmotm; but I have seen some problems with it, and had been
> intending to send you a fixup patch tonight (shmem_reconfigure() missing
> unlock on error is the main problem, but there are other fixes needed).
> 
> But I'm growing tired. I've a feeling my "swap" of the mpols, instead
> of immediate mpol_put(), was necessary to protect against a race with
> shmem_get_sbmpol(), but I'm not clear-headed enough to trust myself on
> that now.  And I've a mystery to solve, that shmem_reconfigure() gets
> stuck into showing the wrong error message.

On my "swap" for the mpol_put(): no, the race against shmem_get_sbmpol()
is safe enough without that, and what you have matches what was always
done before. I rather like my "swap", which the previous double-free had
led me to, but it's fine if you prefer the ordinary way. I was probably
coming down from some over-exposure to iput() under spinlock, but there's
no such complications here.

> 
> Tomorrow....
> 
> Oh, and my first attempt to build and boot that series over 5.3-rc5
> wouldn't boot. Luckily there was a tell-tale "i915" in the stacktrace,
> which reminded me of the drivers/gpu/drm/i915/gem/i915_gemfs.c fix
> we discussed earlier in the cycle.  That is of course in linux-next
> by now, but I wonder if your branch ought to contain a duplicate of
> that fix, so that people with i915 doing bisections on 5.4-rc do not
> fall into an unbootable hole between vfs and gpu merges.

Below are the fixups I arrived at last night (I've not rechecked your
tree today, to see if you made any changes since).  But they're not
enough: I now understand why shmem_reconfigure() got stuck showing
the wrong error message, but I'll have to leave it to you to decide
what to do about it, because I don't know whether it's just a mistake,
or different filesystem types have different needs there.

My /etc/fstab has a line in for one of my test mounts:
tmpfs                /tlo                 tmpfs      size=4G               0 0
and that "size=4G" is what causes the problem: because each time
shmem_parse_options(fc, data) is called for a remount, data (that is,
options) points to a string starting with "size=4G,", followed by
what's actually been asked for in the remount options.

So if I try
mount -o remount,size=0 /tlo
that succeeds, setting the filesystem size to 0 meaning unlimited.
So if then as a test I try
mount -o remount,size=1M /tlo
that correctly fails with "Cannot retroactively limit size".
But then when I try
mount -o remount,nr_inodes=0 /tlo
I again get "Cannot retroactively limit size",
when it should have succeeded (again, 0 here meaning unlimited).

That's because the options in shmem_parse_options() are
"size=4G,nr_inodes=0", which indeed looks like an attempt to
retroactively limit size; but the user never asked "size=4G" there.

I think this problem, and some of what's fixed below, predate your
rework, and would equally affect the version in mmotm: I just didn't
discover these issues when I was testing that before.

Hugh

--- aviro/mm/shmem.c	2019-09-09 14:10:34.379832855 -0700
+++ hughd/mm/shmem.c	2019-09-09 23:29:28.467037895 -0700
@@ -3456,7 +3456,7 @@ static int shmem_parse_one(struct fs_con
 		ctx->huge = result.uint_32;
 		if (ctx->huge != SHMEM_HUGE_NEVER &&
 		    !(IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) &&
-		    has_transparent_hugepage()))
+		      has_transparent_hugepage()))
 			goto unsupported_parameter;
 		ctx->seen |= SHMEM_SEEN_HUGE;
 		break;
@@ -3532,26 +3532,26 @@ static int shmem_reconfigure(struct fs_c
 
 	spin_lock(&sbinfo->stat_lock);
 	inodes = sbinfo->max_inodes - sbinfo->free_inodes;
-	if (ctx->seen & SHMEM_SEEN_BLOCKS) {
+	if ((ctx->seen & SHMEM_SEEN_BLOCKS) && ctx->blocks) {
+		if (!sbinfo->max_blocks) {
+			err = "Cannot retroactively limit size";
+			goto out;
+		}
 		if (percpu_counter_compare(&sbinfo->used_blocks,
 					   ctx->blocks) > 0) {
 			err = "Too small a size for current use";
 			goto out;
 		}
-		if (ctx->blocks && !sbinfo->max_blocks) {
-			err = "Cannot retroactively limit nr_blocks";
+	}
+	if ((ctx->seen & SHMEM_SEEN_INODES) && ctx->inodes) {
+		if (!sbinfo->max_inodes) {
+			err = "Cannot retroactively limit inodes";
 			goto out;
 		}
-	}
-	if (ctx->seen & SHMEM_SEEN_INODES) {
 		if (ctx->inodes < inodes) {
 			err = "Too few inodes for current use";
 			goto out;
 		}
-		if (ctx->inodes && !sbinfo->max_inodes) {
-			err = "Cannot retroactively limit nr_inodes";
-			goto out;
-		}
 	}
 
 	if (ctx->seen & SHMEM_SEEN_HUGE)
@@ -3574,6 +3574,7 @@ static int shmem_reconfigure(struct fs_c
 	spin_unlock(&sbinfo->stat_lock);
 	return 0;
 out:
+	spin_unlock(&sbinfo->stat_lock);
 	return invalf(fc, "tmpfs: %s", err);
 }