[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4F392B8B.4030204@sandia.gov>
Date: Mon, 13 Feb 2012 08:26:03 -0700
From: "Jim Schutt" <jaschut@...dia.gov>
To: "sridhar basam" <sri@...am.org>
cc: ceph-devel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH 0/6] Understanding delays due to throttling
under very heavy write load
On 02/10/2012 05:05 PM, sridhar basam wrote:
>> > But the server never ACKed that packet. Too busy?
>> >
>> > I was collecting vmstat data during the run; here's the important bits:
>> >
>> > Fri Feb 10 11:56:51 MST 2012
>> > vmstat -w 8 16
>> > procs -------------------memory------------------ ---swap-- -----io----
>> > --system-- -----cpu-------
>> > r b swpd free buff cache si so bi bo in
>> > cs us sy id wa st
>> > 13 10 0 250272 944 37859080 0 0 7 5346 1098
>> > 444 2 5 92 1 0
>> > 88 8 0 260472 944 36728776 0 0 0 1329838
>> > 257602 68861 19 73 5 4 0
>> > 100 10 0 241952 944 36066536 0 0 0 1635891 340724
>> > 85570 22 68 6 4 0
>> > 105 9 0 250288 944 34750820 0 0 0 1584816 433223
>> > 111462 21 73 4 3 0
>> > 126 3 0 259908 944 33841696 0 0 0 749648
>> > 225707 86716 9 83 4 3 0
>> > 157 2 0 245032 944 31572536 0 0 0 736841 252406
>> > 99083 9 81 5 5 0
>> > 45 17 0 246720 944 28877640 0 0 1 755085
>> > 282177 116551 8 77 9 5 0
> Holy crap! That might explain why you aren't seeing anything. You are
> writing out over a 1.6 million blocks/sec. That too averaged over a 8
> second interval. I bet the missed acks are when this is happening.
> What sort of I/O load is going through this system during those times?
> What sort of filesystem and Linux system are these OSDs on?
Dual socket Nehalem EP @ 3 GHz, 24 ea. 7200RPM SAS drives w/ 64 MB cache,
3 LSI SAS HBAs w/8 drives per HBA, btrfs, 3.2.0 kernel. Each OSD
has a ceph journal and a ceph data store on a single drive.
I'm running 24 OSDs on such a box; all that write load is the result
of dd from 166 linux ceph clients.
FWIW, I've seen these boxes sustain > 2 GB/s for 60 sec or so under
this load, when I have TSO/GSO/GRO turned on, and am writing to
a freshly created ceph filesystem.
That lasts until my OSDs get stalled reading from a socket, as
documented by those packet traces I posted.
If you compare the timestamps on the retransmits to the times
that vmstat is dumping reports, at least some of the retransmits
hit the system when it is ~80% idle.
-- Jim
>
> Sridhar
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists