lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 22 Sep 2016 20:10:48 -0600 From: Jens Axboe <axboe@...nel.dk> To: Glauber Costa <glauber@...lladb.com>, linux-block@...r.kernel.org Cc: linux-kernel@...r.kernel.org Subject: Re: [PATCH] cfq: fix starvation of asynchronous writes On 09/22/2016 06:59 PM, Glauber Costa wrote: > While debugging timeouts happening in my application workload (ScyllaDB), I have > observed calls to open() taking a long time, ranging everywhere from 2 seconds - > the first ones that are enough to time out my application - to more than 30 > seconds. > > The problem seems to happen because XFS may block on pending metadata updates > under certain circumnstances, and that's confirmed with the following backtrace > taken by the offcputime tool (iovisor/bcc): > > ffffffffb90c57b1 finish_task_switch > ffffffffb97dffb5 schedule > ffffffffb97e310c schedule_timeout > ffffffffb97e1f12 __down > ffffffffb90ea821 down > ffffffffc046a9dc xfs_buf_lock > ffffffffc046abfb _xfs_buf_find > ffffffffc046ae4a xfs_buf_get_map > ffffffffc046babd xfs_buf_read_map > ffffffffc0499931 xfs_trans_read_buf_map > ffffffffc044a561 xfs_da_read_buf > ffffffffc0451390 xfs_dir3_leaf_read.constprop.16 > ffffffffc0452b90 xfs_dir2_leaf_lookup_int > ffffffffc0452e0f xfs_dir2_leaf_lookup > ffffffffc044d9d3 xfs_dir_lookup > ffffffffc047d1d9 xfs_lookup > ffffffffc0479e53 xfs_vn_lookup > ffffffffb925347a path_openat > ffffffffb9254a71 do_filp_open > ffffffffb9242a94 do_sys_open > ffffffffb9242b9e sys_open > ffffffffb97e42b2 entry_SYSCALL_64_fastpath > 00007fb0698162ed [unknown] > > Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very > high "Dispatch wait" times, on the dozens of seconds range and consistent with > the open() times I have saw in that run. > > Still from the blktrace output, we can after searching a bit, identify the > request that wasn't dispatched: > > 8,0 11 152 81.092472813 804 A WM 141698288 + 8 <- (8,1) 141696240 > 8,0 11 153 81.092472889 804 Q WM 141698288 + 8 [xfsaild/sda1] > 8,0 11 154 81.092473207 804 G WM 141698288 + 8 [xfsaild/sda1] > 8,0 11 206 81.092496118 804 I WM 141698288 + 8 ( 22911) [xfsaild/sda1] > <==== 'I' means Inserted (into the IO scheduler) ===================================> > 8,0 0 289372 96.718761435 0 D WM 141698288 + 8 (15626265317) [swapper/0] > <==== Only 15s later the CFQ scheduler dispatches the request ======================> > > As we can see above, in this particular example CFQ took 15 seconds to dispatch > this request. Going back to the full trace, we can see that the xfsaild queue > had plenty of opportunity to run, and it was selected as the active queue many > times. It would just always be preempted by something else (example): > > 8,0 1 0 81.117912979 0 m N cfq1618SN / insert_request > 8,0 1 0 81.117913419 0 m N cfq1618SN / add_to_rr > 8,0 1 0 81.117914044 0 m N cfq1618SN / preempt > 8,0 1 0 81.117914398 0 m N cfq767A / slice expired t=1 > 8,0 1 0 81.117914755 0 m N cfq767A / resid=40 > 8,0 1 0 81.117915340 0 m N / served: vt=1948520448 min_vt=1948520448 > 8,0 1 0 81.117915858 0 m N cfq767A / sl_used=1 disp=0 charge=0 iops=1 sect=0 > > where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB > IO dispatchers. > > The requests preempting the xfsaild queue are synchronous requests. That's a > characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests. > While it can be argued that preempting ASYNC requests in favor of SYNC is part > of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's > goal. > > Moreover, unless I am misunderstanding something, that breaks the expectation > set by the "fifo_expire_async" tunable, which in my system is set to the > default. > > Looking at the code, it seems to me that the issue is that after we make > an async queue active, there is no guarantee that it will execute any request. > > When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC > requests in flight. An incoming request from another queue can also preempt it > in such situation before we have the chance to execute anything (as seen in the > trace above). > > This patch sets the must_dispatch flag if we notice that we have requests > that are already fifo_expired. This flag is always cleared after > cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin > the queue for subsequent requests (unless they are themselves expired) > > Care is taken during preempt to still allow rt requests to preempt us > regardless. > > Testing my workload with this patch applied produces much better results. > From the application side I see no timeouts, and the open() latency histogram > generated by systemtap looks much better, with the worst outlier at 131ms: Good analysis and the fix looks nice and clean. I'll take a closer look tomorrow. Thanks! -- Jens Axboe
Powered by blists - more mailing lists