linux-kernel - Re: [PATCH v1 8/8] mm: kasan: Initial memory quarantine implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160223072326.GA4148@js1304-P5Q-DELUXE>
Date:	Tue, 23 Feb 2016 16:23:27 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	Dmitry Vyukov <dvyukov@...gle.com>
Cc:	Alexander Potapenko <glider@...gle.com>,
	Andrey Konovalov <adech.fo@...il.com>,
	Christoph Lameter <cl@...ux.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrey Ryabinin <ryabinin.a.a@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	kasan-dev <kasan-dev@...glegroups.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [PATCH v1 8/8] mm: kasan: Initial memory quarantine
 implementation

On Fri, Feb 19, 2016 at 10:19:48AM +0100, Dmitry Vyukov wrote:
> On Fri, Feb 19, 2016 at 3:11 AM, Joonsoo Kim <js1304@...il.com> wrote:
> > 2016-02-18 23:06 GMT+09:00 Alexander Potapenko <glider@...gle.com>:
> >> On Mon, Feb 1, 2016 at 3:47 AM, Joonsoo Kim <iamjoonsoo.kim@....com> wrote:
> >>> On Wed, Jan 27, 2016 at 07:25:13PM +0100, Alexander Potapenko wrote:
> >>>> Quarantine isolates freed objects in a separate queue. The objects are
> >>>> returned to the allocator later, which helps to detect use-after-free
> >>>> errors.
> >>>>
> >>>> Freed objects are first added to per-cpu quarantine queues.
> >>>> When a cache is destroyed or memory shrinking is requested, the objects
> >>>> are moved into the global quarantine queue. Whenever a kmalloc call
> >>>> allows memory reclaiming, the oldest objects are popped out of the
> >>>> global queue until the total size of objects in quarantine is less than
> >>>> 3/4 of the maximum quarantine size (which is a fraction of installed
> >>>> physical memory).
> >>>
> >>> Just wondering why not using time based approach rather than size
> >>> based one. In heavy load condition, how much time do the object stay in
> >>> quarantine?
> >>>
> >>>>
> >>>> Right now quarantine support is only enabled in SLAB allocator.
> >>>> Unification of KASAN features in SLAB and SLUB will be done later.
> >>>>
> >>>> This patch is based on the "mm: kasan: quarantine" patch originally
> >>>> prepared by Dmitry Chernenkov.
> >>>>
> >>>> Signed-off-by: Alexander Potapenko <glider@...gle.com>
> >>>> ---
> >>>>  include/linux/kasan.h |  30 ++++--
> >>>>  lib/test_kasan.c      |  29 ++++++
> >>>>  mm/kasan/Makefile     |   2 +-
> >>>>  mm/kasan/kasan.c      |  68 +++++++++++-
> >>>>  mm/kasan/kasan.h      |  11 +-
> >>>>  mm/kasan/quarantine.c | 284 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  mm/kasan/report.c     |   3 +-
> >>>>  mm/mempool.c          |   7 +-
> >>>>  mm/page_alloc.c       |   2 +-
> >>>>  mm/slab.c             |  12 ++-
> >>>>  mm/slab.h             |   4 +
> >>>>  mm/slab_common.c      |   2 +
> >>>>  mm/slub.c             |   4 +-
> >>>>  13 files changed, 435 insertions(+), 23 deletions(-)
> >>>>
> >>>
> >>> ...
> >>>
> >>>> +bool kasan_slab_free(struct kmem_cache *cache, void *object)
> >>>> +{
> >>>> +#ifdef CONFIG_SLAB
> >>>> +     /* RCU slabs could be legally used after free within the RCU period */
> >>>> +     if (unlikely(cache->flags & SLAB_DESTROY_BY_RCU))
> >>>> +             return false;
> >>>> +
> >>>> +     if (likely(cache->flags & SLAB_KASAN)) {
> >>>> +             struct kasan_alloc_meta *alloc_info =
> >>>> +                     get_alloc_info(cache, object);
> >>>> +             struct kasan_free_meta *free_info =
> >>>> +                     get_free_info(cache, object);
> >>>> +
> >>>> +             switch (alloc_info->state) {
> >>>> +             case KASAN_STATE_ALLOC:
> >>>> +                     alloc_info->state = KASAN_STATE_QUARANTINE;
> >>>> +                     quarantine_put(free_info, cache);
> >>>
> >>> quarantine_put() can be called regardless of SLAB_DESTROY_BY_RCU,
> >>> although it's not much meaningful without poisoning. But, I have an
> >>> idea to poison object on SLAB_DESTROY_BY_RCU cache.
> >>>
> >>> quarantine_put() moves per cpu list to global queue when
> >>> list size reaches QUARANTINE_PERCPU_SIZE. If we call synchronize_rcu()
> >>> at that time, after then, we can poison objects. With appropriate size
> >>> setup, it would not be intrusive.
> >>>
> >> Won't this slow the quarantine down unpredictably (e.g. in the case
> >> there're no RCU slabs in quarantine we'll still be waiting for
> >> synchronize_rcu())?
> >
> > It could be handled by introducing one cpu variable.
> >
> >> Yet this is something worth looking into. Do you want RCU to be
> >> handled in this patch set?
> >
> > No. It would be future work.
> >
> >>>> +                     set_track(&free_info->track, GFP_NOWAIT);
> >>>
> >>> set_track() can be called regardless of SLAB_DESTROY_BY_RCU.
> >> Agreed, I can fix that if we decide to handle RCU in this patch
> >> (otherwise it will lead to confusion).
> >>
> >>>
> >>>> +                     kasan_poison_slab_free(cache, object);
> >>>> +                     return true;
> >>>> +             case KASAN_STATE_QUARANTINE:
> >>>> +             case KASAN_STATE_FREE:
> >>>> +                     pr_err("Double free");
> >>>> +                     dump_stack();
> >>>> +                     break;
> >>>> +             default:
> >>>> +                     break;
> >>>> +             }
> >>>> +     }
> >>>> +     return false;
> >>>> +#else
> >>>> +     kasan_poison_slab_free(cache, object);
> >>>> +     return false;
> >>>> +#endif
> >>>> +}
> >>>> +
> >>>
> >>> ...
> >>>
> >>>> +void quarantine_reduce(void)
> >>>> +{
> >>>> +     size_t new_quarantine_size;
> >>>> +     unsigned long flags;
> >>>> +     struct qlist to_free = QLIST_INIT;
> >>>> +     size_t size_to_free = 0;
> >>>> +     void **last;
> >>>> +
> >>>> +     if (likely(ACCESS_ONCE(global_quarantine.bytes) <=
> >>>> +                smp_load_acquire(&quarantine_size)))
> >>>> +             return;
> >>>> +
> >>>> +     spin_lock_irqsave(&quarantine_lock, flags);
> >>>> +
> >>>> +     /* Update quarantine size in case of hotplug. Allocate a fraction of
> >>>> +      * the installed memory to quarantine minus per-cpu queue limits.
> >>>> +      */
> >>>> +     new_quarantine_size = (ACCESS_ONCE(totalram_pages) << PAGE_SHIFT) /
> >>>> +             QUARANTINE_FRACTION;
> >>>> +     new_quarantine_size -= QUARANTINE_PERCPU_SIZE * num_online_cpus();
> >>>> +     smp_store_release(&quarantine_size, new_quarantine_size);
> >>>> +
> >>>> +     last = global_quarantine.head;
> >>>> +     while (last) {
> >>>> +             struct kmem_cache *cache = qlink_to_cache(last);
> >>>> +
> >>>> +             size_to_free += cache->size;
> >>>> +             if (!*last || size_to_free >
> >>>> +                 global_quarantine.bytes - QUARANTINE_LOW_SIZE)
> >>>> +                     break;
> >>>> +             last = (void **) *last;
> >>>> +     }
> >>>> +     qlist_move(&global_quarantine, last, &to_free, size_to_free);
> >>>> +
> >>>> +     spin_unlock_irqrestore(&quarantine_lock, flags);
> >>>> +
> >>>> +     qlist_free_all(&to_free, NULL);
> >>>> +}
> >>>
> >>> Isn't it better to call quarantine_reduce() in shrink_slab()?
> >>> It will help to maximize quarantine time.
> >> This is true, however if we don't call quarantine_reduce() from
> >> kmalloc()/kfree() the size of the quarantine will be unpredictable.
> >> There's a tradeoff between efficiency and space here, and at least in
> >> some cases we may want to trade efficiency for space.
> >
> > size of the quarantine doesn't matter unless there is memory pressure.
> > If memory pressure, shrink_slab() would be called and we can reduce
> > size of quarantine. However, I don't think this is show stopper. We can
> > do it when needed.
> 
> 
> No, this does not work. We've tried.
> The problem is fragmentation. When all memory is occupied by slab,
> it's already too late to reclaim memory. Free objects are randomly
> scattered over memory, so if you have just 1% of live objects, the
> chances are that you won't be able to reclaim any single page.

Okay. Now, I got it.

Thanks.