[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0094198b6c3382ee2efbd4431e4ad1bfb8cef269.camel@unipv.it>
Date: Tue, 24 Dec 2019 09:51:16 +0100
From: Andrea Vai <andrea.vai@...pv.it>
To: Ming Lei <ming.lei@...hat.com>, "Theodore Y. Ts'o" <tytso@....edu>
Cc: "Schmid, Carsten" <Carsten_Schmid@...tor.com>,
Finn Thain <fthain@...egraphics.com.au>,
Damien Le Moal <Damien.LeMoal@....com>,
Alan Stern <stern@...land.harvard.edu>,
Jens Axboe <axboe@...nel.dk>,
Johannes Thumshirn <jthumshirn@...e.de>,
USB list <linux-usb@...r.kernel.org>,
SCSI development list <linux-scsi@...r.kernel.org>,
Himanshu Madhani <himanshu.madhani@...ium.com>,
Hannes Reinecke <hare@...e.com>,
Omar Sandoval <osandov@...com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Greg KH <gregkh@...uxfoundation.org>,
Hans Holmberg <Hans.Holmberg@....com>,
Kernel development list <linux-kernel@...r.kernel.org>,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: AW: Slow I/O on USB media after commit
f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
Il giorno mar, 24/12/2019 alle 09.27 +0800, Ming Lei ha scritto:
> Hi Ted,
>
> On Mon, Dec 23, 2019 at 02:53:01PM -0500, Theodore Y. Ts'o wrote:
> > On Mon, Dec 23, 2019 at 07:45:57PM +0100, Andrea Vai wrote:
> > > basically, it's:
> > >
> > > mount UUID=$uuid /mnt/pendrive
> > > SECONDS=0
> > > cp $testfile /mnt/pendrive
> > > umount /mnt/pendrive
> > > tempo=$SECONDS
> > >
> > > and it copies one file only. Anyway, you can find the whole
> script
> > > attached.
> >
> > OK, so whether we are doing the writeback at the end of cp, or
> when
> > you do the umount, it's probably not going to make any
> difference. We
> > can get rid of the stack trace in question by changing the script
> to
> > be basically:
> >
> > mount UUID=$uuid /mnt/pendrive
> > SECONDS=0
> > rm -f /mnt/pendrive/$testfile
> > cp $testfile /mnt/pendrive
> > umount /mnt/pendrive
> > tempo=$SECONDS
> >
> > I predict if you do that, you'll see that all of the time is spent
> in
> > the umount, when we are trying to write back the file.
> >
> > I really don't think then this is a file system problem at
> all. It's
> > just that USB I/O is slow, for whatever reason. We'll see a stack
> > trace in the writeback code waiting for the I/O to be completed,
> but
> > that doesn't mean that the root cause is in the writeback code or
> in
> > the file system which is triggering the writeback.
>
> Wrt. the slow write on this usb storage, it is caused by two
> writeback
> path, one is the writeback wq, another is from ext4_release_file()
> which
> is triggered from exit_to_usermode_loop().
>
> When the two write path is run concurrently, the sequential write
> order
> is broken, then write performance drops much on this particular usb
> storage.
>
> The ext4_release_file() should be run from read() or write() syscall
> if
> Fedora 30's 'cp' is implemented correctly. IMO, it isn't expected
> behavior
> for ext4_release_file() to be run thousands of times when just
> running 'cp' once, see comment of ext4_release_file():
>
> /*
> * Called when an inode is released. Note that this is
> different
> * from ext4_file_open: open gets called at every open, but
> release
> * gets called only when /all/ the files are closed.
> */
> static int ext4_release_file(struct inode *inode, struct file
> *filp)
>
> >
> > I suspect the next step is use a blktrace, to see what kind of I/O
> is
> > being sent to the USB drive, and how long it takes for the I/O to
> > complete. You might also try to capture the output of "iostat -x
> 1"
> > while the script is running, and see what the difference might be
> > between a kernel version that has the problem and one that
> doesn't,
> > and see if that gives us a clue.
>
> That isn't necessary, given we have concluded that the bad write
> performance is caused by broken write order.
>
> >
> > > > And then send me
> > > btw, please tell me if "me" means only you or I cc: all the
> > > recipients, as usual
> >
> > Well, I don't think we know what the root cause is. Ming is
> focusing
> > on that stack trace, but I think it's a red herring..... And if
> it's
> > not a file system problem, then other people will be best suited
> to
> > debug the issue.
>
> So far, the reason points to the extra writeback path from
> exit_to_usermode_loop().
> If it is not from close() syscall, the issue should be related with
> file reference
> count. If it is from close() syscall, the issue might be in 'cp''s
> implementation.
>
> Andrea, please collect the following log or the strace log requested
> by Ted, then
> we can confirm if the extra writeback is from close() or
> read/write() syscall:
>
> # pass PID of 'cp' to this script
> #!/bin/sh
> PID=$1
> /usr/share/bcc/tools/trace -P $PID -t -C \
> 't:block:block_rq_insert "%s %d %d", args->rwbs, args->sector,
> args->nr_sector' \
> 't:syscalls:sys_exit_close ' \
> 't:syscalls:sys_exit_read ' \
> 't:syscalls:sys_exit_write '
Meanwhile, I tried to run the test and obtained an error (...usage:
trace [-h] [-b BUFFER_PAGES] [-p PID]...), so assumed the "-P" should
be "-p", corrected and obtained the attached log with ext4 and a slow
copy (2482 seconds) by doing:
- start the test
- look at the cp pid
- run the trace
- wait for the test to finish
- stop the trace.
Thanks,
Andrea
Download attachment "20191224_test_ming.zip" of type "application/zip" (20830 bytes)
Powered by blists - more mailing lists