lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1476947872-14485-1-git-send-email-joelaf@google.com>
Date:   Thu, 20 Oct 2016 00:17:45 -0700
From:   Joel Fernandes <joelaf@...gle.com>
To:     linux-kernel@...r.kernel.org
Cc:     Kees Cook <keescook@...omium.org>,
        Joel Fernandes <joelaf@...gle.com>
Subject: [PATCH 0/7] pstore: Improve performance of ftrace backend with ramoops

Currently ramoops uses a single zone to store function traces. To make this
work, it has to uses locking to synchronize accesses to the buffers. Recently
the synchronization was completely moved from a cmpxchg mechanism to raw
spinlocks due to difficulties in using cmpxchg on uncached memory and also on
RAMs behind PCIe. [1] This change further dropped the peformance of ramoops
pstore backend by more than half in my tests.

This patch series improves the situation dramatically by around 280% from what
it is now by creating a ramoops persistent zone for each CPU and avoiding use of
locking altogether for ftrace. At init time, the persistent zones are then
merged together.

Here are some tests to show the improvements.  Tested using a qemu quad core
x86_64 instance with -mem-path to persist the guest RAM to a file. I measured
avergage throughput of dd over 30 seconds:

dd if=/dev/zero | pv | dd of=/dev/null

Without this patch series: 24MB/s
with per-cpu buffers and trace_clock: 51.9 MB/s
With per-cpu buffers and counter increment: 91.5 MB/s (improvement by ~ 281%)

Changes since RFC [2]:
- improve commit message clarity for optional locking of zone buffers.
- use macro for better code clarity of locking requirements
- use kcalloc instead of kmalloc for allocating prz array
- print warning if pmsg calls write_buf instead of write_buf_user
- free zones properly for ftrace per CPU usecase.

[1] https://lkml.org/lkml/2016/9/8/375
[2] https://lkml.org/lkml/2016/10/8/12

Joel Fernandes (7):
  pstore: Make spinlock per zone instead of global
  pstore: locking: dont lock unless caller asks to
  pstore: Warn for the case of PSTORE_TYPE_PMSG write using deprecated
    function
  pstore: Make ramoops_init_przs generic for other prz arrays
  ramoops: Split ftrace buffer space into per-CPU zones
  pstore: Add support to store timestamp counter in ftrace records
  pstore: Merge per-CPU ftrace zones into one zone for output

 fs/pstore/ftrace.c         |   3 +
 fs/pstore/inode.c          |   7 +-
 fs/pstore/internal.h       |  34 -------
 fs/pstore/ram.c            | 236 +++++++++++++++++++++++++++++++++++----------
 fs/pstore/ram_core.c       |  30 +++---
 include/linux/pstore.h     |  69 +++++++++++++
 include/linux/pstore_ram.h |  14 ++-
 7 files changed, 291 insertions(+), 102 deletions(-)

-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ