[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <03ae65df-a369-436d-b31c-b3cec6ca3bc1@suse.de>
Date: Mon, 19 Aug 2024 14:48:00 +0200
From: Hannes Reinecke <hare@...e.de>
To: David Howells <dhowells@...hat.com>,
"Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>
Cc: brauner@...nel.org, akpm@...ux-foundation.org, chandan.babu@...cle.com,
linux-fsdevel@...r.kernel.org, djwong@...nel.org, gost.dev@...sung.com,
linux-xfs@...r.kernel.org, hch@....de, david@...morbit.com,
Zi Yan <ziy@...dia.com>, yang@...amperecomputing.com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, willy@...radead.org,
john.g.garry@...cle.com, cl@...amperecomputing.com, p.raghav@...sung.com,
mcgrof@...nel.org, ryan.roberts@....com
Subject: Re: [PATCH v12 00/10] enable bs > ps in XFS
On 8/19/24 13:46, David Howells wrote:
> Hi Pankaj,
>
> I can reproduce the problem with:
>
> xfs_io -t -f -c "pwrite -S 0x58 0 40" -c "fsync" -c "truncate 4" -c "truncate 4096" /xfstest.test/wubble; od -x /xfstest.test/wubble
>
> borrowed from generic/393. I've distilled it down to the attached C program.
>
> Turning on tracing and adding a bit more, I can see the problem happening.
> Here's an excerpt of the tracing (I've added some non-upstream tracepoints).
> Firstly, you can see the second pwrite at fpos 0, 40 bytes (ie. 0x28):
>
> pankaj-5833: netfs_write_iter: WRITE-ITER i=9e s=0 l=28 f=0
> pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 mod-streamw
>
> Then first ftruncate() is called to reduce the file size to 4:
>
> pankaj-5833: netfs_truncate: ni=9e isz=2028 rsz=2028 zp=4000 to=4
> pankaj-5833: netfs_inval_folio: pfn=116fec i=0009e ix=00000-00001 o=4 l=1ffc d=78787878
> pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 inval-part
> pankaj-5833: netfs_set_size: ni=9e resize-file isz=4 rsz=4 zp=4
>
> You can see the invalidate_folio call, with the offset at 0x4 an the length as
> 0x1ffc. The data at the beginning of the page is 0x78787878. This looks
> correct.
>
> Then second ftruncate() is called to increase the file size to 4096
> (ie. 0x1000):
>
> pankaj-5833: netfs_truncate: ni=9e isz=4 rsz=4 zp=4 to=1000
> pankaj-5833: netfs_inval_folio: pfn=116fec i=0009e ix=00000-00001 o=1000 l=1000 d=78787878
> pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 inval-part
> pankaj-5833: netfs_set_size: ni=9e resize-file isz=1000 rsz=1000 zp=4
>
> And here's the problem: in the invalidate_folio() call, the offset is 0x1000
> and the length is 0x1000 (o= and l=). But that's the wrong half of the folio!
> I'm guessing that the caller thereafter clears the other half of the folio -
> the bit that should be kept.
>
> David
> ---
> /* Distillation of the generic/393 xfstest */
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> #define ERR(x, y) do { if ((long)(x) == -1) { perror(y); exit(1); } } while(0)
>
> static const char xxx[40] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
> static const char yyy[40] = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy";
> static const char dropfile[] = "/proc/sys/vm/drop_caches";
> static const char droptype[] = "3";
> static const char file[] = "/xfstest.test/wubble";
>
> int main(int argc, char *argv[])
> {
> int fd, drop;
>
> /* Fill in the second 8K block of the file... */
> fd = open(file, O_CREAT|O_TRUNC|O_WRONLY, 0666);
> ERR(fd, "open");
> ERR(ftruncate(fd, 0), "pre-trunc $file");
> ERR(pwrite(fd, yyy, sizeof(yyy), 0x2000), "write-2000");
> ERR(close(fd), "close");
>
> /* ... and drop the pagecache so that we get a streaming
> * write, attaching some private data to the folio.
> */
> drop = open(dropfile, O_WRONLY);
> ERR(drop, dropfile);
> ERR(write(drop, droptype, sizeof(droptype) - 1), "write-drop");
> ERR(close(drop), "close-drop");
>
> fd = open(file, O_WRONLY, 0666);
> ERR(fd, "reopen");
> /* Make a streaming write on the first 8K block (needs O_WRONLY). */
> ERR(pwrite(fd, xxx, sizeof(xxx), 0), "write-0");
> /* Now use truncate to shrink and reexpand. */
> ERR(ftruncate(fd, 4), "trunc-4");
> ERR(ftruncate(fd, 4096), "trunc-4096");
> ERR(close(fd), "close-2");
> exit(0);
> }
>
Wouldn't the second truncate end up with a 4k file, and not an 8k?
IE the resulting file will be:
After step 1: 8k
After step 2: 4
After step 3: 4k
Hmm?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@...e.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Powered by blists - more mailing lists