linux-kernel - Re: Using ftrace to identify source of excessive latency of USB write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5368A9F8.2070609@free.fr>
Date:	Tue, 06 May 2014 11:23:04 +0200
From:	Mason <mpeg.blue@...e.fr>
To:	linux-kernel@...r.kernel.org
CC:	linux-embedded@...r.kernel.org, linux-ide@...r.kernel.org
Subject: Re: Using ftrace to identify source of excessive latency of USB
 write

Mason wrote:

> I'm using Linux on a embedded system similar in spec to a desktop PC
> from 15 years ago (256 MB RAM, 800-MHz CPU, USB). The system's primary
> use is recording high-definition digital television programs.
> 
> Typically, the storage sub-system consists of a recent hard-disk drive
> connected over USB (Hi-Speed, effective throughput ~20-30 MB/s), using
> a single ext4 partition (journal disabled), mounted noexec+noatime
> (trying to minimize metadata interference).
> 
> Typical bit-rate for this HD content ~1-3 MB/s
> 
> Data is accumulated in two 800-kB buffers; when one buffer is full,
> it is written to file (using write(2)), which was opened O_SYNC.
> (Note to self: try O_DSYNC instead of O_SYNC)
> 
> If I plot the latency of the write(2) operation, 99% of them complete
> in under 80 ms. However, in rare cases, there is a huge latency spike
> (up to 800 ms). If several of these rare outliers occur in a row,
> the recording is messed up.
> 
> I am trying to figure out the source of these latency spikes.
> 
> It could be the OS, the USB controller, the HDD controller... I was
> hoping I could use ftrace to determine whether the problem came from
> the OS itself. Is that the best tool for the job?
> 
> Any recommendations on how to proceed?

A few people suggested other tools, which I'll give a try; but
I wanted to share my current measurements, in case someone sees
something obvious I am missing.

First of all, I'm using kernel 3.1.10 with "vendor patches"
(which may or may not alter performance and/or behavior).

I wrote a simple test app (attached) which times and ftraces a
single synchronized write of 786432 bytes.

The program reported a latency of 275 ms for the operation
(run in SCHED_FIFO -50 on an otherwise idle system)

I used a USB analyzer to "spy" on the bus during the transaction.

According to my analyzer, the data write was split into 7 transactions
(6*122880 + 49512)

- The first write command  was issued at 18.307930
- The first write transaction started at 18.310230
- The first write transaction   ended at 18.313567
(3337 µs to transfer 122880 bytes, i.e. 36.8 MB/s)
- The  last write transaction   ended at 18.368911
(58681 µs to transfer 786432 bytes, i.e. 13.4 MB/s not great, but OK)
- Then one more write (4096 bytes, metadata probably) ending at 18.374458

=> so, from the perspective of the bus, the entire transaction
took 66.5 ms, yet from the perspective of the OS, it took 275 ms.

I've attached the (xz compressed) ftrace (trace.one.xz)

If I'm reading the trace right (which is unlikely) the system spent
64 ms in __filemap_fdatawrite_range
(which is close to the "actual" latency of the bus transactions)
212 ms (??) in schedule

 0)               |  schedule() {
 0)               |    __schedule() {
 0)               |      add_preempt_count() {
 0)   0.000 us    |      } /* add_preempt_count */
 0)               |      rcu_note_context_switch() {
 0)   0.000 us    |      } /* rcu_note_context_switch */
 0)               |      _raw_spin_lock_irq() {
 0)               |        add_preempt_count() {
 0)   0.000 us    |        } /* add_preempt_count */
 0)   0.000 us    |      } /* _raw_spin_lock_irq */
 0)               |      put_prev_task_idle() {
 0)   0.000 us    |      } /* put_prev_task_idle */
 0)               |      pick_next_task_fair() {
 0)   0.000 us    |      } /* pick_next_task_fair */
 0)               |      pick_next_task_stop() {
 0)   0.000 us    |      } /* pick_next_task_stop */
 0)               |      pick_next_task_rt() {
 0)   0.000 us    |      } /* pick_next_task_rt */
 0)               |      pick_next_task_fair() {
 0)   0.000 us    |      } /* pick_next_task_fair */
 0)               |      pick_next_task_idle() {
 0)   0.000 us    |      } /* pick_next_task_idle */
 0)               |      _raw_spin_unlock_irq() {
 0)               |        sub_preempt_count() {
 0)   0.000 us    |        } /* sub_preempt_count */
 0)   0.000 us    |      } /* _raw_spin_unlock_irq */
 0)               |      sub_preempt_count() {
 0)   0.000 us    |      } /* sub_preempt_count */
 0)   0.000 us    |    } /* __schedule */
 0) ! 212000.0 us |  } /* schedule */

Am I reading this right? 212 ms doing nothing?

212+64 is close to the 275 reported by my application.

How can I "drill down" to understand what is happening
inside schedule?

[I would be grateful if you could CC me in your replies. Thanks!]

-- 
Regards.


View attachment "bench2.c" of type "text/plain" (1976 bytes)

Download attachment "trace.one.xz" of type "application/octet-stream" (21464 bytes)