linux-kernel - Re: [PATCH 1/1] suspend: delete sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJvTdKm15=+S4Y0mR+rzm8FOi2sfwGSqz9yiSh+GNcgMf85NxQ@mail.gmail.com>
Date:	Fri, 19 Jun 2015 02:34:37 -0400
From:	Len Brown <lenb@...nel.org>
To:	Dave Chinner <david@...morbit.com>
Cc:	NeilBrown <neilb@...e.de>,
	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Ming Lei <tom.leiming@...il.com>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Linux PM List <linux-pm@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Len Brown <len.brown@...el.com>
Subject: Re: [PATCH 1/1] suspend: delete sys_sync()

> Can you repeat this test on your system, so that we can determine if
> the 5ms ""sync time" is actually just the overhead of inode cache
> traversal? If that is the case, the speed of sync on a clean
> filesystem is already a solved problem - the patchset should be
> merged in the 4.2 cycle....

Yes, drop_caches does seem to help repeated sync on this system:
Exactly what patch series does this?  I'm running ext4 (the default,
not btrfs)

[lenb@...5xbx ~]$ for i in `seq 0 1 10`; do time sleep 0 ; done

real    0m0.002s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.001s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.001s
sys     0m0.000s

real    0m0.001s
user    0m0.001s
sys     0m0.000s

real    0m0.001s
user    0m0.001s
sys     0m0.000s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.001s
sys     0m0.000s

real    0m0.001s
user    0m0.001s
sys     0m0.000s

real    0m0.001s
user    0m0.000s
sys     0m0.001s
[lenb@...5xbx ~]$ for i in `seq 0 1 10`; do time sync ; done

real    0m0.004s
user    0m0.000s
sys     0m0.003s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.003s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.002s
user    0m0.000s
sys     0m0.002s
[lenb@...5xbx ~]$ sudo grep ext4_inode /proc/slabinfo
ext4_inode_cache    3536   3536   1008   16    4 : tunables    0    0
  0 : slabdata    221    221      0
[lenb@...5xbx ~]$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches "
[lenb@...5xbx ~]$ sudo grep ext4_inode /proc/slabinfo
ext4_inode_cache     553   1680   1008   16    4 : tunables    0    0
  0 : slabdata    105    105      0
[lenb@...5xbx ~]$ for i in `seq 0 1 10`; do time sync ; done

real    0m0.002s
user    0m0.000s
sys     0m0.001s

real    0m0.002s
user    0m0.000s
sys     0m0.002s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.001s

real    0m0.001s
user    0m0.000s
sys     0m0.002s


>> While they were all to slow, none of them were
>>  O(500ms), so yes, there
>> does seem to be some state change
>> that causes the 2nd sync after a resume to be especially slow.
>>
>> Unfortunately, I've not got an ftrace on the 500ms flavor yet.
>
> This is the problem we really need to reproduce and track down.

Putting a function trace on sys_sync and executing sync manually,
I was able to see it take 100ms,
though function trace itself could be contributing to that...

[lenb@...5xbx ~]$ grep \# trace.20150619_013702
# tracer: function_graph
#
#     TIME        CPU  DURATION                  FUNCTION CALLS
#      |          |     |   |                     |   |   |   |
  374.665063 |   0) # 2229.612 us |                } /* __schedule */
  374.665064 |   0) # 2230.571 us |              } /* schedule */
  374.665064 |   0) # 2231.494 us |            } /* schedule_timeout */
  374.665065 |   0) # 2235.937 us |          } /* wait_for_completion */
  374.745616 |   0) # 80518.73 us |                          } /* __schedule */
  374.745616 |   0) # 80519.47 us |                        } /* schedule */
  374.745617 |   0) # 80520.28 us |                      } /*
schedule_timeout */
  374.745621 |   0) # 80526.38 us |                    } /*
io_schedule_timeout */
  374.745621 |   0) # 80527.23 us |                  } /* bit_wait_io */
  374.745622 |   0) # 80531.04 us |                } /* __wait_on_bit */
  374.745623 |   0) # 80531.95 us |              } /* wait_on_page_bit */
  374.745644 |   0) # 80555.58 us |            } /* filemap_fdatawait_range */
  374.745644 |   0) # 80556.36 us |          } /* filemap_fdatawait */
  374.748029 |   0) # 1300.848 us |                          } /* __schedule */
  374.748029 |   0) # 1301.376 us |                        } /* schedule */
  374.748029 |   0) # 1301.923 us |                      } /*
schedule_timeout */
  374.748032 |   0) # 1306.133 us |                    } /*
io_schedule_timeout */
  374.748032 |   0) # 1306.651 us |                  } /* bit_wait_io */
  374.748033 |   0) # 1309.298 us |                } /* __wait_on_bit */
  374.748033 |   0) # 1309.838 us |              } /* wait_on_page_bit */
  374.750502 |   0) # 1099.379 us |                          } /* __schedule */
  374.750503 |   0) # 1100.102 us |                        } /* schedule */
  374.750503 |   0) # 1100.882 us |                      } /*
schedule_timeout */
  374.750509 |   0) # 1108.399 us |                    } /*
io_schedule_timeout */
  374.750510 |   0) # 1109.160 us |                  } /* bit_wait_io */
  374.750511 |   0) # 1112.541 us |                } /* __wait_on_bit */
  374.750512 |   0) # 1113.310 us |              } /* wait_on_page_bit */
  374.752063 |   0) # 5827.910 us |            } /* filemap_fdatawait_range */
  374.752063 |   0) # 5828.517 us |          } /* filemap_fdatawait */
  374.764753 |   0) # 101948.3 us |        } /* sync_inodes_sb */
  374.764754 |   0) # 101949.1 us |      } /* sync_inodes_one_sb */
  374.764903 |   0) # 102198.2 us |    } /* iterate_supers */
  374.767693 |   0) # 1094.872 us |                  } /* blk_flush_plug_list */
  374.767693 |   0) # 1095.405 us |                } /* blk_finish_plug */
  374.767694 |   0) # 1780.430 us |              } /* generic_writepages */
  374.767694 |   0) # 1781.172 us |            } /* do_writepages */
  374.767694 |   0) # 1784.820 us |          } /* __filemap_fdatawrite_range */
  374.767695 |   0) # 1785.551 us |        } /* filemap_fdatawrite */
  374.767695 |   0) # 1786.357 us |      } /* fdatawrite_one_bdev */
  374.767698 |   0) # 1857.427 us |    } /* iterate_bdevs */
  374.767818 |   0) # 105179.2 us |  } /* sys_sync */

running analyze_suspend.py after the slab tweak above didn't change much.
in one run sync was 20ms (out of a total suspend time of 60ms).

Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms.
I've seen that in some other results.  Sometimes sync if fast, but sd
then more than makes up for it by being slow:-(

FYI,
I ran analyze_suspend.py -x2
from current directory /tmp, which is mounted on tmpfs,
but still found the 2nd sync was very slow -- 200ms
vs 6 - 20 ms for the sync preceding the 1st suspend.

thanks
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/