lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 23 Aug 2015 08:04:43 +0200 From: Ingo Molnar <mingo@...nel.org> To: George Spelvin <linux@...izon.com> Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, torvalds@...ux-foundation.org, Dave Hansen <dave@...1.net>, Peter Zijlstra <peterz@...radead.org>, David Rientjes <rientjes@...gle.com>, Rik van Riel <riel@...hat.com>, Rasmus Villemoes <linux@...musvillemoes.dk> Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics * George Spelvin <linux@...izon.com> wrote: > Linus wrote: > > I don't think any of this can be called "correct", in that the > > unlocked accesses to the cached state are clearly racy, but I think > > it's very much "acceptable". > > I'd think you could easily fix that with a seqlock-like system. > > What makes it so simple is that you can always fall back to > calc_vmalloc_info if there's any problem, rather than looping or blocking. > > The basic idea is that you have a seqlock counter, but if either of > the two lsbits are set, the cached information is stale. > > Basically, you need a seqlock and a spinlock. The seqlock does > most of the work, and the spinlock ensures that there's only one > updater of the cache. > > vmap_unlock() does set_bit(0, &seq->sequence). This marks the information > as stale. > > get_vmalloc_info reads the seqlock. There are two case: > - If the two lsbits are 10, the cached information is valid. > Copy it out, re-check the seqlock, and loop if the sequence > number changes. > - In any other case, the cached information is > not valid. > - Try to obtain the spinlock. Do not block if it's unavailable. > - If unavailable, do not block. > - If the lock is acquired: > - Set the sequence to (sequence | 3) + 1 (we're the only writer) > - This bumps the sequence number and leaves the lsbits at 00 (invalid) > - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us? > - Call calc_vmalloc_info > - If we obtained the spinlock earlier: > - Copy our vmi to cached_info > - smp_wmb() > - set_bit(1, &seq->sequence). This marks the information as valid, > as long as bit 0 is still clear. > - Release the spinlock. > > Basically, bit 0 says "vmalloc info has changed", and bit 1 says > "vmalloc cache has been updated". This clears bit 0 before > starting the update so that an update during calc_vmalloc_info > will force a new update. > > So the three case are basically: > 00 - calc_vmalloc_info() in progress > 01 - vmap_unlock() during calc_vmalloc_info() > 10 - cached_info is valid > 11 - vmap_unlock has invalidated cached_info, awaiting refresh > > Logically, the sequence number should be initialized to ...01, > but the code above handles 00 okay. I think this is too complex. How about something simple like the patch below (on top of the third patch)? It makes the vmalloc info transactional - /proc/meminfo will always print a consistent set of numbers. (Not that we really care about races there, but it looks really simple to solve so why not.) ( I also moved the function-static cache next to the flag and seqlock - this should further compress the cache footprint. ) Have I missed anything? Very lightly tested: booted in a VM. Thanks, Ingo =========================> mm/vmalloc.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ef48e557df5a..66726f41e726 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn); static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); +/* + * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo. + * + * The assumption of the optimization is that it's read frequently, but + * modified infrequently. + */ +static DEFINE_SEQLOCK(vmap_info_lock); static int vmap_info_changed; +static struct vmalloc_info vmap_info_cache; static inline void vmap_lock(void) { @@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi) void get_vmalloc_info(struct vmalloc_info *vmi) { - static struct vmalloc_info cached_info; + if (!READ_ONCE(vmap_info_changed)) { + unsigned int seq; + + do { + seq = read_seqbegin(&vmap_info_lock); + *vmi = vmap_info_cache; + } while (read_seqretry(&vmap_info_lock, seq)); - if (!vmap_info_changed) { - *vmi = cached_info; return; } @@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi) calc_vmalloc_info(vmi); - barrier(); - cached_info = *vmi; + write_seqlock(&vmap_info_lock); + vmap_info_cache = *vmi; + write_sequnlock(&vmap_info_lock); } #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists