lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 19 Nov 2010 14:29:03 -0500
From:	Chris Mason <chris.mason@...cle.com>
To:	Lukas Czerner <lczerner@...hat.com>
Cc:	Christoph Hellwig <hch@...radead.org>,
	Greg Freemyer <greg.freemyer@...il.com>,
	Mark Lord <kernel@...savvy.com>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	James Bottomley <james.bottomley@...e.de>,
	Jeff Moyer <jmoyer@...hat.com>,
	Matthew Wilcox <matthew@....cx>,
	Josef Bacik <josef@...hat.com>, tytso <tytso@....edu>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	sandeen <sandeen@...hat.com>
Subject: Re: [PATCH 1/2] fs: Do not dispatch FITRIM through separate super_operation

Excerpts from Lukas Czerner's message of 2010-11-19 13:06:16 -0500:
> On Fri, 19 Nov 2010, Christoph Hellwig wrote:
> 
> > On Fri, Nov 19, 2010 at 08:20:58AM -0800, Greg Freemyer wrote:
> > > The kernel team has been coding around some Utopian SSD TRIM
> > > implementation for at least 2 years with the basic assumption that
> > > SSDs can handle thousands of trims per second.  Just mix em in with
> > > the rest of the i/o.  No problem.  Intel swore to us its the right
> > > thing to do.
> > 
> > Thanks Greg, good that you told us what we've been doing.  I would have
> > forgot myself if you didn't remember me.
> > 
> > > I'm still waiting to see the first benchmark report from anywhere
> > > (SSD, Thin Provisioned SCSI) that the online approach used by mount -o
> > > discard is a win performance wise.  Linux has a history of designing
> > > for reality, but for some reason when it comes to SSDs reality seems
> > > not to be a big concern.
> > 
> > Both Lukas and I have done extensive benchmarks on various SSDs and
> > thinkly provisioned raids.  Unfortunately most of the hardware is only
> > available under NDA so we can't publish it.
> > 
> > For the XFS side which I've looked it I can summarize that we do have
> > arrays that do the online discard without measureable performance
> > penalty on various workloads, and we have devices (both SSDs and arrays)
> > where the overhead is incredibly huge.  I can also say that doing the
> > walk of the freespace btrees similar to the offline discard, but every
> > 30 seconds or at a similarly high interval is a sure way to completely
> > kill performance.
> > 
> > Or in short we haven't fund the holy grail yet.
> > 
> 
> Indeed we have not. But speaking of benchmarks I have just finished
> quick run (well, not so quick:)) of my discard-kit for btrfs filesystem
> and here are results. Note that tool used for this benchmark is
> postmark, hence it might not be the realest use-case, but it provides
> nice comparison between ext4 (below) and btrfs online discard
> implementation (FITRIM is NOT involved).

Thanks a lot for posting these, I know it takes forever to run them.

I hesitate to trust postmark too much for comparing the ext4 trim with
the btrfs trim because we might have dramatically different lifetimes on
the blocks.  So if I manage to just do fewer allocations than ext4, I'll
also do fewer trims.

I'd also be curious to see how many trims each of us did, maybe running
w/blktrace could show that?

The btrfs online discard will trim all the metadata blocks as they are
freed, and in a COW filesystem this makes for a very noisy trim.  We
could reduce our trim load considerably by only trimming data blocks,
and only trimming metadata when we make a big free extent.

The default btrfs options duplicate metadata, so we actually end up
doing 2 trims for every metadata block we free.

At any rate, I definitely think both the online trim and the FITRIM have
their uses.  One thing that has burnt us in the past is coding too much
for the performance of the current crop of ssds when the next crop ends
up making our optimizations useless.

This is the main reason I think the online trim is going to be better
and better.  The FS has a ton of low hanging fruit in there and the
devices are going to improve.  At some point the biggest perf problem
will just be the non-queueable trim command.

One thing I haven't seen benchmarked is how trim changes the performance
of the SSD as the poor little log structured squirrels inside run out
of places to store things.  Does it get rid of the cliffs in performance as
the drive ages, and how do we measure that in general?

-chris

> 
> 
> (Sadly the table is too wide so you have to...well, you guys can manage
> it somehow, right?).
> 
> BTRFS
> -----
> 
>                                    |          BUFFERING ENABLED         |       BUFFERING DISABLED           |
> --------------------------------------------------------------------------------------------------------------
> Type                               |NODISCARD    DISCARD      DIFF      |NODISCARD    DISCARD      DIFF      |
> ==============================================================================================================
> Total_duration                     |230.90       336.20       45.60%    |232.00       335.00       44.40%    |
> Duration_of_transactions           |159.60       266.10       66.73%    |158.90       264.60       66.52%    |
> Transactions/s                     |313.32       188.01       -39.99%   |314.70       189.07       -39.92%   |
> Files_created/s                    |323.84       222.48       -31.30%   |322.28       223.28       -30.72%   |
> Creation_alone/s                   |778.08       796.37       2.35%     |756.66       787.68       4.10%     |
> Creation_mixed_with_transaction/s  |155.16       93.11        -39.99%   |155.84       93.63        -39.92%   |
> Read/s                             |156.50       93.91        -39.99%   |157.18       94.44        -39.92%   |
> Append/s                           |156.82       94.10        -39.99%   |157.50       94.63        -39.92%   |
> Deleted/s                          |323.84       222.48       -31.30%   |322.28       223.28       -30.72%   |
> Deletion_alone/s                   |770.64       788.75       2.35%     |749.42       780.15       4.10%     |
> Deletion_mixed_with_transaction/s  |158.16       94.90        -40.00%   |158.85       95.44        -39.92%   |
> Read_B/s                           |11925050.90  8192800.35   -31.30%   |11867797.20  8221997.40   -30.72%   |
> Write_B/s                          |37318466.00  25638695.00  -31.30%   |37139294.00  25730064.60  -30.72%   |
> ==============================================================================================================
> 
> EXT4
> ----
>                                    |          BUFFERING ENABLED         |       BUFFERING DISABLED           |
> --------------------------------------------------------------------------------------------------------------
> Type                               |NODISCARD    DISCARD      DIFF      |NODISCARD    DISCARD      DIFF      |
> ==============================================================================================================
> Total_duration                     |306.10       512.70       67.49%    |301.60       516.10       71.12%    |
> Duration_of_transactions           |243.50       449.80       84.72%    |239.00       453.90       89.92%    |
> Transactions/s                     |205.43       111.19       -45.87%   |209.32       110.17       -47.37%   |
> Files_created/s                    |244.30       145.85       -40.30%   |247.97       144.87       -41.58%   |
> Creation_alone/s                   |834.88       830.60       -0.51%    |830.60       833.42       0.34%     |
> Creation_mixed_with_transaction/s  |101.73       55.06        -45.88%   |103.66       54.55        -47.38%   |
> Read/s                             |102.61       55.54        -45.87%   |104.55       55.03        -47.36%   |
> Append/s                           |102.82       55.65        -45.88%   |104.76       55.14        -47.37%   |
> Deleted/s                          |244.30       145.85       -40.30%   |247.97       144.87       -41.58%   |
> Deletion_alone/s                   |826.90       822.66       -0.51%    |822.66       825.46       0.34%     |
> Deletion_mixed_with_transaction/s  |103.70       56.13        -45.87%   |105.66       55.61        -47.37%   |
> Read_B/s                           |8996110.60   5370694.40   -40.30%   |9131349.20   5334560.40   -41.58%   |
> Write_B/s                          |28152588.40  16807146.60  -40.30%   |28575806.40  16694068.00  -41.58%   |
> ==============================================================================================================
> 
> 
> (Buffering means that C library function like fopen, fread, fwrite are
> used instead of open, read, write. I have used the word buffering in the
> same way as it is used in the postmark test)
> 
> So, you can see that Btrfs handles online discard quite better than ext4
> (cca 20% difference), but it is still pretty massive performance loss on
> not-so-good-but-I-have-seen-worse SSD. So, I would say that you guys
> (Josef?) should at least consider the possibility of using FITRIM as well.
> 
> Thanks!
> 
> -Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ