[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1545375580.3424480.1368968403367.JavaMail.ngmail@webmail10.arcor-online.net>
Date: Sun, 19 May 2013 15:00:03 +0200 (CEST)
From: frankcmoeller@...or.de
To: linux-ext4@...r.kernel.org
Subject: Aw: Re: Aw: Re: Ext4: Slow performance on first write after mount
Hi,
> One question regarding fallocate: I create a new file and do a 100MB
> fallocate
> with FALLOC_FL_KEEP_SIZE. Then I write only 70MB to that file and close it.
> Is the 30 MB unused preallocated space still preallocated for that file
> after closing
> it? Or does a close release the preallocated space?
I did some tests and now I can answer it by myself ;-)
The space stays preallocated after closing the file. Also umount don't releases
the space. Interesting!
I was testing concurrent fallocates and writes to the same file descriptor. It
seems to work. If it is quick enough I cannot say at the moment.
Regards,
Frank
----- Original Nachricht ----
Von: frankcmoeller@...or.de
An: linux-ext4@...r.kernel.org
Datum: 19.05.2013 12:01
Betreff: Re: Aw: Re: Ext4: Slow performance on first write after mount
> Hi Andreas,
>
> > Part of the problem is that filesystems are rarely unmounted cleanly, so
> it
> > means that this information would need to be updated periodically to disk
> so
> > that it is available after a crash.
> > I wouldn't object to some kind of "lazy" updating of group information on
> > disk that at least gives the newly-mounted filesystem a rough idea of
> what
> > each group's usage is. It wouldn't have to be totally accurate (it
> wouldn't
> > replace the bitmaps), but maybe 2 bits per group would be enough as a
> > starting point?
> > For a 32 TB filesystem that would be about 16 4kB blocks of bits that
> would
> > be updated periodically (e.g. every five minutes or so). Since the
> allocator
> > will typically work in successive groups that might not cause too much
> > churn.
>
> Yes, you're right. The stored data wouldn't be 100% reliable. And yes, it
> would be really good if
> right after mount the filesystem would knew something more to find a good
> group quicker.
> What do you think of this:
> 1. I read this already in some discussions: You already store the free space
> amount for every
> group. Why not also storing how big the biggest contiguous free space
> block in a group is? Then you
> don't have to read the whole group.
> 2. What about a list (in memory and also stored on disk) with all unused
> groups (1 bit for every group).
> If the allocator cannot find a good group within lets say half second, a
> group from this list is used.
> The list is also not be 100% reliable (because of the mentioned unclean
> unmounts), so you need to search
> a good group in the list. If no good group was found in the list, the
> allocator can continue searching.
> This don't helps in all situations (e.g. almost full disk or every group
> contains a small amount of data),
> but it should be in many cases much faster, if the list is not totally
> outdated.
>
> > It would be possible to fallocate() at some expected size (e.g. average
> file
> > size) and then either truncate off the unused space, or fallocate() some
> > more in another thread when you are close to tunning out.
> > If the fallocate() is done in a separate thread the latency can be hidden
> > from the main application?
> Adding a new thread for fallocate shouldn't be a big problem. But fallocate
> might
> generate high disk usage (while searching for a good group). I don't know
> whether
> parallel writing from the other thread is quick enough.
>
> One question regarding fallocate: I create a new file and do a 100MB
> fallocate
> with FALLOC_FL_KEEP_SIZE. Then I write only 70MB to that file and close it.
> Is the 30 MB unused preallocated space still preallocated for that file
> after closing
> it? Or does a close release the preallocated space?
>
> Regards,
> Frank
>
> >
> > Cheers, Andreas
> >
> > > And you have to take care about alignment and there are several threads
> in
> > the internet which explain why you shouldn't use it (or only in very
> special
> > situations and I don't think that my situation is one of them). And ext4
> > group initialization takes also place when using O_DIRECT (as said before
> > perhaps I did something wrong).
> > >
> > > Regards,
> > > Frank
> > >
> > > ----- Original Nachricht ----
> > > Von: "Sidorov, Andrei" <Andrei.Sidorov@...isi.com>
> > > An: "frankcmoeller@...or.de" <frankcmoeller@...or.de>, ext4
> > development <linux-ext4@...r.kernel.org>
> > > Datum: 17.05.2013 23:18
> > > Betreff: Re: Ext4: Slow performance on first write after mount
> > >
> > >> Hi Frank,
> > >>
> > >> Consider using bigalloc feature (requires reformat), preallocate space
> > >> with fallocate and use O_DIRECT for reads/writes. However, 188k writes
> > >> are too small for good throughput with O_DIRECT. You might also want
> to
> > >> adjust max_sectors_kb to something larger than 512k.
> > >>
> > >> We're doing 6in+6out 20Mbps streams just fine.
> > >>
> > >> Regards,
> > >> Andrei.
> > >>
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4"
> in
> > > the body of a message to majordomo@...r.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists