[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160531002152.GQ26977@dastard>
Date: Tue, 31 May 2016 10:21:52 +1000
From: Dave Chinner <david@...morbit.com>
To: Gernot Hillier <gernot.hillier@...mens.com>
Cc: Theodore Ts'o <tytso@....edu>, linux-scsi@...r.kernel.org,
MPT-FusionLinux.pdl@...adcom.com, linux-ext4@...r.kernel.org,
sathya.prakash@...adcom.com, chaitra.basappa@...adcom.com,
suganath-prabu.subramani@...adcom.com
Subject: Re: unexpected sync delays in dpkg for small pre-allocated files on
ext4
On Mon, May 30, 2016 at 10:27:52AM +0200, Gernot Hillier wrote:
> Hi!
>
> On 25.05.2016 01:13, Theodore Ts'o wrote:
> > On Tue, May 24, 2016 at 07:07:41PM +0200, Gernot Hillier wrote:
> >> We experience strange delays with kernel 4.1.18 during dpkg
> >> package installation on an ext4 filesystem after switching from
> >> Ubuntu 14.04 to 16.04. We can reproduce the issue with kernel 4.6.
> >> Installation of the same package takes 2s with ext3 and 31s with
> >> ext4 on the same partition.
> >>
> >> Hardware is an Intel-based server with Supermicro X8DTH board and
> >> Seagate ST973451SS disks connected to an LSI SAS2008 controller (PCI
> >> 0x1000:0x0072, mpt2sas driver).
> [...]
> >> To me, the problem looks comparable to
> >> https://bugzilla.kernel.org/show_bug.cgi?id=56821 (even if we don't see
> >> a full hang and there's no RAID involved for us), so a closer look on
> >> the SCSI layer or driver might be the next step?
> >
> > What I would suggest is to create a small test case which compares the
> > time it takes to allocate 1 megabyte of memory, zero it, and then
> > write one megabytes of zeros using the write(2) system call. Then try
> > writing one megabytes of zero using the BLKZEROOUT ioctl.
>
> Ok, this is my test code:
>
> const int SIZE = 1*1024*1024;
> char* buffer = malloc(SIZE);
> uint64_t range[2] = { 0, SIZE };
> int fd = open("/dev/sdb2", O_WRONLY);
>
> bzero(buffer, SIZE);
> write(fd, buffer, SIZE);
> sync_file_range(fd, 0, 0, 2);
>
> ioctl (fd, BLKZEROOUT, range);
>
> close(fd);
> free(buffer);
>
> # strace -tt ./test-tytso
> [...]
> 15:46:27.481636 open("/dev/sdb2", O_WRONLY) = 3
> 15:46:27.482004 write(3, "\0\0\0\0\0\0"..., 1048576) = 1048576
> 15:46:27.482438 sync_file_range(3, 0, 0, SYNC_FILE_RANGE_WRITE) = 0
> 15:46:27.482698 ioctl(3, BLKZEROOUT, [0, 100000]) = 0
> 15:46:27.546971 close(3) = 0
>
> So the write() and sync_file_range() in the first case takes ~400 us
> each while BLKZEROOUT takes... 60 ms. Wow.
Comparing apples to oranges.
Unlike the name implies, sync_file_range() does not provide any data
integrity semantics what-so-ever: SYNC_FILE_RANGE_WRITE only submits
IO to clean dirty pages - that only takes 400us of CPU time. It
does not wait for completion, nor does it flush the drive cache and
so by the time the syscall returns to userspace the IO may not have
even been sent to the device (e.g. it could be queued by the IO
scheduler in the block layer). i.e. you're not timing IO, you're
timing CPU overhead of IO submission.
For an apples to apples comparison, you need to use fsync() to
physically force the written data to stable storage and wait for
completion. This is what BLKZEROOUT is effectively doing, so I think
you'll find fdatasync() also takes around 60ms...
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists