linux-kernel - Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55529B7F.4080401@phunq.net>
Date:	Tue, 12 May 2015 17:31:59 -0700
From:	Daniel Phillips <daniel@...nq.net>
To:	David Lang <david@...g.hm>
CC:	Theodore Ts'o <tytso@....edu>, Howard Chu <hyc@...as.com>,
	Dave Chinner <david@...morbit.com>,
	linux-kernel@...r.kernel.org,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Pavel Machek <pavel@....cz>, tux3@...3.org,
	linux-fsdevel@...r.kernel.org,
	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Subject: Re: xfs: does mkfs.xfs require fancy switches to get decent performance?
 (was Tux3 Report: How fast can we fsync?)

On 05/12/2015 02:30 PM, David Lang wrote:
> On Tue, 12 May 2015, Daniel Phillips wrote:
>> Phoronix published a headline that identifies Dave Chinner as
>> someone who takes shots at other projects. Seems pretty much on
>> the money to me, and it ought to be obvious why he does it.
> 
> Phoronix turns any correction or criticism into an attack.

Phoronix gets attacked in an unseemly way by a number of people
in the developer community who should behave better. You are
doing it yourself, seemingly oblivious to the valuable role that
the publication plays in our community. Google for filesystem
benchmarks. Where do you find them? Right. Not to mention the
Xorg coverage, community issues, etc etc. The last thing we need
is a monoculture in Linux news, and we are dangerously close to
that now.

So, how is "EXT4 is not as stable or as well tested as most
people think" not a cheap shot? By my first hand experience, that
claim is absurd. Add to that the first hand experience of roughly
two billion other people. Seems to be a bit self serving too, or
was that just an accident.

> You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are
> friendly competitors, not Enemies.

You are wrong about Dave, These are not the words of any friend:

   "I don't think I'm alone in my suspicion that there was something
   stinky about your numbers." -- Dave Chinner

Basically allegations of cheating. And wrong. Maybe Dave just
lives in his own dreamworld where everybody is out to get him, so
he has to attack people he views as competitors first.

Ted has more taste and his FUD attack was more artful, but it
still amounted to nothing more than piling on, He just picked up
Dave's straw man uncritically and proceeded to knock it down
some more. Nice way of distracting attention from the fact that
we actually did what we claimed, and instead of getting the
appropriate recognition for it, we were called cheaters. More or
less in so many words by Dave, and more subtly by Ted, but the
intent is clear and unmistakable. Apologies from both are still
in order, but it will be a rainy day in that hot place before we
ever see either of them do the right thing.

That said, Ted is no enemy, he is brilliant and usually conducts
himself admirably. Except sometimes. I wish I would say the same
about Dave, but what I see there is a guy who has invested his
entire identity in his XFS career and is insecure that something
might conspire against him to disrupt it. I mean, come on, if you
convince Redhat management to elevate your life's work to the
status of something that most of the paid for servers in the
world are going to run, do you continue attacking your peers or
do you chill a bit?

> They assume that you are working in good faith (but are
> inexperienced compared to them), and you need to assume that they are working in good faith. If they
> ever do resort to underhanded means to sabotage you, Linus and the other kernel developers will take
> action. But pointing out limits in your current implementation, problems in your benchmarks based on
> how they are run, and concepts that are going to be difficult to merge is not underhanded, it's
> exactly the type of assistance that you should be greatful for in friendly competition.
> 
> You were the one who started crowing about how badly XFS performed.

Not at all, somebody else posted the terrible XFS benchmark result,
then Dave put up a big smokescreen to try to deflect atention from
it. There is a term for that kind of logical fallacy:

   http://en.wikipedia.org/wiki/Proof_by_intimidation

Seems to have worked well on you. But after all those words, XFS
does not run any faster, and it clearly needs to.

> Dave gave a long and detailed explination about the reasons for the differences, and showing
benchmarks on other hardware that
> showed that XFS works very well there. That's not an attack on EXT4 (or Tux3), it's an explination.

Long, detailed, and bogus. Summary: "oh, XFS doesn't work well on
that hardware? Get new hardware." Excuse me, but other filesystems
do work well on that hardware, the problem is not with the hardware.

> I have my own concerns about how things are going to work (I've voiced some of them), but no, I
> haven't tried running Tux3 because you say it's not ready yet.

I did not say that. I said it is not ready for users. It is more
than ready for anybody who wants to develop it, or benchmark it,
or put test data on it, and has been for a long time. Except for
enospc, and that was apparently not an issue for Btrfs, was it.

>> You know what to do about checking for faulty benchmarks.
> 
> That requires that the code be readily available, which last I heard, Tux3 wasn't. Has this been fixed?

You heard wrong. The code is readily available and you can clone it
from here:

    https://github.com/OGAWAHirofumi/linux-tux3.git

The hirofumi-user branch has the user tools including mkfs and basic
fsck, and the hirofumi branch is a 3.19 Linus kernel that includes Tux3.
(So is hirofumi-user branch, but Hirofumi likes people to build from
the other one, which is pure kernel.)

We do of course have patches not pushed to the public repository yet,
which includes enospc, so the public code is easily crashable. If I
were you, I would wait for enospc to land, but that is by no means
necessary if your objective is just to verify that we tell the truth.

> They pointed out problems with using ramdisk to simulate a SSD and huge differences between spinning
> rust and an SSD (or disk array). Those aren't FUD.

Not FUD perhaps, but wrong all the same. I have plenty of evidence
at hand to be sure of that, so I don't need to theorize about it.
Ramdisk is surprisingly predictive of performance on other media,
and is arguably closer to what the new generation of NVRAM behaves
like than flash is.

>>> As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and
>>> Memory Mangement folks you have to convince. You may need a little benchmarking to show that there
>>> is a real advantage to be gained, but the real discussion is going to be on the impact that page
>>> forking is going to have on everything else (both in complexity and in performance impact to other
>>> things)
>>
>> Yet he clearly wrote "we" as if he believes he is part of it.
> 
> He is part of the group of people who use and work with this stuff, so he is part of it.

He is not part of a committee that decides what to merge, yet he
spoke as if he was. Just a slip maybe? Let's call it that. Slip or
not, it is a divisive and offensive attitude.

> BTRFS is a perfect example of how not to introduce a new filesystem. Lots of hype, the presumption
> that is is going to replace all the existing filesystems because it's so much better (especially
> according to benchmarks). But then progress stalled before it was really ready, and it's still
> something most people avoid.

Disagree. Merging Btrfs was the only way to save it. Not everyone
avoids it. Btrfs has its share of ardent supporters, ready or not.
One day Btrfs will be ready and the rough spots will be a fading
memory. That is healthy. What Dave is trying to do to Tux3 is kind
of sick.

Even though I do not like the Btrfs design, I hope it succeeds and
fills that void where a big, fat, full featured filesystem that does
everything including sending email should be.

>> Proving the data consistency claims would be a little harder, you
>> need tools for that, and some of those aren't built yet. Or, if you
>> have technical ability, you can read the code and the copious design
>> material that has been posted and convince yourself that, yes, there
>> is something cool here, why didn't anybody do it that way before?
>> But of course that starts to sound like work. Debating nontechnical
>> issues and playing politics seems so much more like fun.
> 
> why are you picking a fight? there was no attack in my statement?

Sorry, did I pick a fight? You *are* debating nontechnical issues
and politics, and it *does* sound like work to go do your own
benchmarks. And if it is not fun for you, then why are you doing it?
Please do not take that the wrong way, you obviously enjoy it and
there is nothing wrong with that.

>>> If they didn't
>>> believe this, why would they be working on the filesystem instead of just using an existing
>>> filesystem.
>>
>> Right, and it is my job to convince you that what I believe for
>> perfectly valid, demonstrable technical reasons, is really true. I do
>> not see why you feel it is your job to convince me that the obviously
>> broken Linux community process is not in fact broken, and that a
>> certain person who obviously has an agenda, is not actually obstructing.
> 
> You will need to have a fully working, usable system before you can convince people that you are
> right.

Logical fallacy alert. You say there is only one way to convince
somebody of something, when in fact more ways may exist. And "fully
working" translates as "I get to decide what fully working means".
Ask yourself this: in order to convince you that you will die if you
jump off the empire state building, do I actually need to jump off
it, or may I explain to you the principles of gravitation instead?

Anyway, I will offer "has enospc" as a reasonable definition of "fully
working". Tux3 has actually been doing the things (out of space
handling excepted) a normal filesystem does for years. Just not
always as fast or reliably as it now does

A partial system may look good, but how much is fixing the corner cases that you haven't
> gotten to yet going to hurt it?

Straw man. To which corner cases do you refer, and why should we fix
them now instead of attending to the issues that we feel are important?

That there are going to be such cases is pretty much a given, and
> that changing things to add code to work around the pathalogical conditions is going to hurt the
> common case is pretty close to a given (it's one of those things that isn't mathamatically
> guaranteed, but happens on 99.99999+% of projects)

Another straw man. To which pathological condition do you refer, and
why is it so important that we need to drop everything and attend to
it now?

>>> The ugly reality is that everyone's early versions of their new filesystem looks really good. The
>>> problem is when they extend it to cover the corner cases and when it gets stressed by real-world (as
>>> opposed to benchmark) workloads. This isn't saying that you are wrong in your belief, just that you
>>> may not be right, and nobody will know until you are to a usable state and other people can start
>>> beating on it.
>>
>> With ENOSPC we are at that state. Tux3 would get more testing and advance
>> faster if it was merged. Things like ifdefs, grandiose new schemes for
>> writeback infrastructure, dumb little hooks in the mkwrite path, those
>> are all just manufactured red herrings. Somebody wanted those to be
>> issues, so now they are issues. Fake ones.
> 
> Ok, so you are happy with your allocation strategy? you didn't seem to be a few e-mail ago.

I am not happy with our allocation strategy, it can be improved
immensely. It is also not the most important thing in the world,
because nobody intends put their mission critical files on it.

I do see people trying to raise that issue as a merge blocker, which
would be an excellent example of how broken our community process is
if it did actually turn out to block our merge. If it concerns you
then store some files on it yourself and see if it really is a killer
problem. Alternatively, it might be exactly the sort of thing that
an interested contributor could take on, and if that is true, then
delaying merge so it can bottleneck on me instead would not make
sense.

If you actually go look at the code, you will see there is some rather
nice infrastructure in there for supporting allocation policy, and
there actually is a preliminary allocation policy, it just does not
meet our standards for production work.

> but if you think it's ready for users, then start working to submit it in the next merge window.

Red Herring. It is not supposed to be ready for users. It is supposed
to be ready for developers. Development kernel, right? Experimental
status and all that. Users are cordially invited to stay away until
further notice.

> Dave said that except for one part, there was no reason not to merge it. That's pretty good. So you
> need to be discussing that one part with the the folks that Dave pointed you at.

Oops, I missed that, are you sure? Perhaps you mean the writeback
interface. Already started on that, already talking. But do keep in
mind that his demand was always a makework project, and frankly, a
nonsensical way to go about things. It's an >internal< api, see.
Internal apis are declared to be flexible, by Linus himself. We
already have a nice, simple patch that implements a simple api that
works fine, we use it all the time. Dave was the one who suggested
we do it exactly like that, so we did. Then Dave moved the goalposts
by insisting that we should throw that one way and tackle a much
bigger project in core that is essentially a R&D project. Not
willing to play that game for a possibly endless number of iterations,
I turned instead to things that actually matter.

Anyway, the writeback project involves us, and VFS developers, you
know who they are. I would prefer that Dave not be involved. For
the record, Jan Kara is great to work with, did you see that patch
set he produced for us? Sadly, I was not able to get into it to
the extent it deserved at the time.

> As I said above, Btrfs is a perfect example of how not to do things.

Unfair. It worked. The alternative is most probably, no Btrfs, ever.
Which do you choose?

The fact that Hirofumi and I kept on with Tux3 got it to where it
is today after all the nasty things that went on and are still going
on is nothing short of a miracle. Thank Hirofumi. If it were not for
him I would have quit years ago and that would have been the end of
it. There are a lot more fun things to do in life than put up with
incessant FUD attacks from the ilk of Dave Chinner. You should tattoo
that on your arm so you can contemplate it when thinking about whether
the Linux community is dysfunctional or not.

> The other think you need to realize is that getting something in the kernel isn't a one-time effort,
> the code needs to be maintained over time (especially for a filesystem), and it's very possible for
> a developer/team/company to be so toxic and hostile to others that the Linux folks don't want to
> deal with the hassle of dealing with them. You are starting out on a path to put yourself into that
> category. Calm down and stop taking offense at everything. Your succeeding doesn't require that
> other people loose, so stop talking as if it's a zero sum game and you have to beat down the enemy
> to get your code accepted.

That argument is "blame the victim", with a bit of intimidation thrown
in. If we are to work together in an atmosphere of harmony and mutual
respect then let's see some effort from more than one side please.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/