lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANN689HfCT8uHKBeMHF-2Xa_eW9y9=UY7WdD1gW2EetqpcVSMw@mail.gmail.com>
Date:	Tue, 25 Feb 2014 18:04:51 -0800
From:	Michel Lespinasse <walken@...gle.com>
To:	Davidlohr Bueso <davidlohr@...com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>, aswin@...com,
	scott.norton@...com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: per-thread vma caching

On Tue, Feb 25, 2014 at 10:16 AM, Davidlohr Bueso <davidlohr@...com> wrote:
> This patch is a continuation of efforts trying to optimize find_vma(),
> avoiding potentially expensive rbtree walks to locate a vma upon faults.
> The original approach (https://lkml.org/lkml/2013/11/1/410), where the
> largest vma was also cached, ended up being too specific and random, thus
> further comparison with other approaches were needed. There are two things
> to consider when dealing with this, the cache hit rate and the latency of
> find_vma(). Improving the hit-rate does not necessarily translate in finding
> the vma any faster, as the overhead of any fancy caching schemes can be too
> high to consider.

Actually there is also the cost of keeping the cache up to date. I'm
not saying that it's an issue in your proposal - I like the proposal,
especially now that you are replacing the per-mm cache rather than
adding something on top - but it is a factor to consider.


> +static inline void __vmacache_invalidate(struct mm_struct *mm)
> +{
> +#ifdef CONFIG_MMU
> +       vmacache_invalidate(mm);
> +#else
> +       mm->vmacache = NULL;
> +#endif
> +}

Is there any reason why we can't use your proposal for !CONFIG_MMU as well ?
(I'm assuming that we could reduce preprocessor checks by doing so)


> +void vmacache_invalidate_all(void)
> +{
> +       struct task_struct *g, *p;
> +
> +       rcu_read_lock();
> +       for_each_process_thread(g, p) {
> +               /*
> +                * Only flush the vmacache pointers as the
> +                * mm seqnum is already set and curr's will
> +                * be set upon invalidation when the next
> +                * lookup is done.
> +                */
> +               memset(p->vmacache, 0, sizeof(p->vmacache));
> +       }
> +       rcu_read_unlock();
> +}

Two things:

- I believe we only need to invalidate vma caches for threads that
share a given mm ? we should probably pass in that mm in order to
avoid over-invalidation

- My understanding is that the operation is safe because the caller
has the mm's mmap_sem held for write, and other threads accessing the
vma cache will have mmap_sem held at least for read, so we don't need
extra locking to maintain the vma cache. Please 1- confirm this is the
intention, 2- document this, and 3- only invalidate vma caches for
threads that match the caller's mm so that mmap_sem locking can
actually apply.


> +struct vm_area_struct *vmacache_find(struct mm_struct *mm,
> +                                    unsigned long addr)
> +
> +{
> +       int i;
> +
> +       if (!vmacache_valid(mm))
> +               return NULL;
> +
> +       for (i = 0; i < VMACACHE_SIZE; i++) {
> +               struct vm_area_struct *vma = current->vmacache[i];
> +
> +               if (vma && vma->vm_start <= addr && vma->vm_end > addr)
> +                       return vma;
> +       }
> +
> +       return NULL;
> +}
> +
> +void vmacache_update(struct mm_struct *mm, unsigned long addr,
> +                    struct vm_area_struct *newvma)
> +{
> +       /*
> +        * Hash based on the page number. Provides a good
> +        * hit rate for workloads with good locality and
> +        * those with random accesses as well.
> +        */
> +       int idx = (addr >> PAGE_SHIFT) & 3;
> +       current->vmacache[idx] = newvma;
> +}

I did read the previous discussion about how to compute idx here. I
did not at the time realize that you are searching all 4 vmacache
entries on lookup - that is, we are only talking about eviction policy
here, not a lookup hash policy.

My understanding is that the reason both your current and your
previous idx computations work, is that a random eviction policy would
work too. Basically, what you do is pick some address bits that are
'random enough' to use as an eviction policy.

This is more of a question for Linus, but I am very surprised that I
can't find an existing LRU eviction policy scheme in Linux. What I
have in mind is to keep track of the order the cache entries have last
been used. With 4 entries, there are 4! = 24 possible orders, which
can be represented with an integer between 0 and 23. When
vmacache_find suceeds, that integer is updated using a table lookup
(table takes 24*4 = 96 bytes). In vmacache_update, the lru value
module 4 indicates which cache way to evict (i.e. it's the one that's
been least recently used).

I don't think it makes sense to introduce such an LRU facility for
this cache only, but it's so generically useful, I'm very surprised
that it's not somewhere already and instead we see people inventing
new eviction schemes over and over again...

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ