lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120215224221.22050.80605.stgit@zurg>
Date:	Thu, 16 Feb 2012 02:57:04 +0400
From:	Konstantin Khlebnikov <khlebnikov@...nvz.org>
To:	linux-mm@...ck.org
Cc:	linux-kernel@...r.kernel.org
Subject: [PATCH RFC 00/15] mm: memory book keeping and lru_lock splitting

There should be no logic changes in this patchset, this is only tossing bits around.
[ This patchset is on top some memcg cleanup/rework patches,
  which I sent to linux-mm@ today/yesterday ]

Most of things in this patchset are self-descriptive, so here brief plan:

* Transmute struct lruvec into struct book. Like real book this struct will
  store set of pages for one zone. It will be working unit for reclaimer code.
[ If memcg is disabled in config there will only one book embedded into struct zone ]

* move page-lru counters to struct book
[ this adds extra overhead in add_page_to_lru_list()/del_page_from_lru_list() for
  non-memcg case, but I believe it will be invisible, only one non-atomic add/sub
  in the same cacheline with lru list ]

* unify inactive_list_is_low_global() and cleanup reclaimer code
* replace struct mem_cgroup_zone with single pointer to struct book
* optimize page to book translations, move it upper in the call stack,
  replace some struct zone arguments with struct book pointer.

page -> book dereference become main operation, page (even free) will always
points to one book in its zone. so page->flags bits may contains direct reference to book.
Maybe we can replace page_zone(page) with book_zone(page_book(page)), without mem cgroups
book -> zone dereference will be simple container_of().

Finally, there appears some new locking primitives for decorating lru_lock splitting logic.
Final patch actually splits zone->lru_lock into small per-book pieces.
All this code currently *completely untested*, but seems like it already can work.


After that, there two options how manage struct book on mem-cgroup create/destroy:
a) [ currently implemented ] allocate and release by rcu.
   Thus lock_page_book() will be protected with rcu_read_lock().
b) allocate and never release struct book, reuse them after rcu grace period.
   It allows to avoid some rcu_read_lock()/rcu_read_unlock() calls on hot paths.


Motivation:
I wrote the similar memory controller for our rhel6-based openvz/virtuozzo kernel,
including splitted lru-locks and some other [patented LOL] cool stuff.
[ common descrioption without techical details: http://wiki.openvz.org/VSwap ]
That kernel already in production and rather stable for a long time.

---

Konstantin Khlebnikov (15):
      mm: rename struct lruvec into struct book
      mm: memory bookkeeping core
      mm: add book->pages_count
      mm: unify inactive_list_is_low()
      mm: add book->reclaim_stat
      mm: kill struct mem_cgroup_zone
      mm: move page-to-book translation upper
      mm: introduce book locking primitives
      mm: handle book relocks on lumpy reclaim
      mm: handle book relocks in compaction
      mm: handle book relock in memory controller
      mm: optimize books in update_page_reclaim_stat()
      mm: optimize books in pagevec_lru_move_fn()
      mm: optimize putback for 0-order reclaim
      mm: split zone->lru_lock


 include/linux/memcontrol.h |   52 -------
 include/linux/mm_inline.h  |  222 ++++++++++++++++++++++++++++-
 include/linux/mmzone.h     |   26 ++-
 include/linux/swap.h       |    2 
 init/Kconfig               |    4 +
 mm/compaction.c            |   35 +++--
 mm/huge_memory.c           |   10 +
 mm/memcontrol.c            |  238 ++++++++++---------------------
 mm/page_alloc.c            |   20 ++-
 mm/swap.c                  |  128 ++++++-----------
 mm/vmscan.c                |  334 +++++++++++++++++++-------------------------
 11 files changed, 554 insertions(+), 517 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ