lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84144f020807170101x25c9be11qd6e1996460bb24fc@mail.gmail.com>
Date:	Thu, 17 Jul 2008 11:01:21 +0300
From:	"Pekka Enberg" <penberg@...helsinki.fi>
To:	"Eduard - Gabriel Munteanu" <eduard.munteanu@...ux360.ro>
Cc:	cl@...ux-foundation.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	"Randy Dunlap" <rdunlap@...otime.net>,
	"Matt Mackall" <mpm@...enic.com>
Subject: Re: [RFC PATCH 1/4] kmemtrace: Core implementation.

Hi,

[Adding Randy to cc for the Documentation/ parts and Matt for the core.]

On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu
<eduard.munteanu@...ux360.ro> wrote:
> kmemtrace provides tracing for slab allocator functions, such as kmalloc,
> kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed
> to the userspace application in order to analyse allocation hotspots,
> internal fragmentation and so on, making it possible to see how well an
> allocator performs, as well as debug and profile kernel code.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@...ux360.ro>
> ---
>  Documentation/kernel-parameters.txt |    6 +
>  Documentation/vm/kmemtrace.txt      |   96 ++++++++++++++++
>  MAINTAINERS                         |    6 +
>  include/linux/kmemtrace.h           |  110 ++++++++++++++++++
>  init/main.c                         |    2 +
>  lib/Kconfig.debug                   |    4 +
>  mm/Makefile                         |    2 +-
>  mm/kmemtrace.c                      |  208 +++++++++++++++++++++++++++++++++++
>  8 files changed, 433 insertions(+), 1 deletions(-)
>  create mode 100644 Documentation/vm/kmemtrace.txt
>  create mode 100644 include/linux/kmemtrace.h
>  create mode 100644 mm/kmemtrace.c
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index b52f47d..b230aff 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -49,6 +49,7 @@ parameter is applicable:
>        ISAPNP  ISA PnP code is enabled.
>        ISDN    Appropriate ISDN support is enabled.
>        JOY     Appropriate joystick support is enabled.
> +       KMEMTRACE kmemtrace is enabled.
>        LIBATA  Libata driver is enabled
>        LP      Printer support is enabled.
>        LOOP    Loopback device support is enabled.
> @@ -941,6 +942,11 @@ and is between 256 and 4096 characters. It is defined in the file
>                        use the HighMem zone if it exists, and the Normal
>                        zone if it does not.
>
> +       kmemtrace.subbufs=n     [KNL,KMEMTRACE] Overrides the number of
> +                       subbufs kmemtrace's relay channel has. Set this
> +                       higher than default (KMEMTRACE_N_SUBBUFS in code) if
> +                       you experience buffer overruns.
> +
>        movablecore=nn[KMG]     [KNL,X86-32,IA-64,PPC,X86-64] This parameter
>                        is similar to kernelcore except it specifies the
>                        amount of memory used for migratable allocations.
> diff --git a/Documentation/vm/kmemtrace.txt b/Documentation/vm/kmemtrace.txt
> new file mode 100644
> index 0000000..1147ecb
> --- /dev/null
> +++ b/Documentation/vm/kmemtrace.txt
> @@ -0,0 +1,96 @@
> +                       kmemtrace - Kernel Memory Tracer
> +
> +                         by Eduard - Gabriel Munteanu
> +                            <eduard.munteanu@...ux360.ro>
> +

A chapter on what kmemtrace is here would probably be helpful.

> +
> +I. Design and goals
> +===================
> +
> +kmemtrace was designed to handle rather large amounts of data. Thus, it uses
> +the relay interface to export whatever is logged to userspace, which then
> +stores it. Analysis and reporting is done asynchronously, that is, after the
> +data is collected and stored. By design, it allows one to log and analyse
> +on different machines and different arches.
> +
> +As this is a debugging feature, kmemtrace's ABI is not designed to be very
> +stable, although this may happen in the future if it's deemed mature and
> +sufficient. So the userspace tool does not contain a copy of the kernel
> +header. Instead, the ABI allows checking if the logged data matches the
> +userspace tool. Well, what I said about ABI stability isn't totally true:
> +while I've tried hard to cover all possible (and useful) use cases, I don't
> +want it frozen in the current state. I anticipate the ABI will be _quite_
> +stable, even across multiple stable kernel versions, but I don't make any
> +guarantees regarding this matter.
> +
> +Summary of design goals:
> +       - allow logging and analysis to be done across different machines
> +       - be fast and anticipate usage in high-load environments (*)
> +       - be reasonably extensible
> +       - have a _reasonably_ (not completely) stable ABI
> +
> +(*) - one of the reasons Pekka Enberg's original userspace data analysis
> +    tool's code was rewritten from Perl to C (although this is more than a
> +    simple conversion)
> +
> +
> +II. Quick usage guide
> +=====================
> +
> +1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
> +CONFIG_KMEMTRACE).
> +
> +2) Get the userspace tool and build it:
> +$ git-clone git://repo.or.cz/kmemtrace-user.git                # current repository
> +$ cd kmemtrace-user/
> +$ autoreconf
> +$ ./configure          # Supply KERNEL_SOURCES=/path/to/sources/ if you're
> +                       # _not_ running this on a kmemtrace-enabled kernel.
> +$ make

As I mentioned in private, I would prefer we drop autoconf from the
userspace tool, but maybe that's just my personal preference.

> +
> +3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
> +'single' runlevel (so that relay buffers don't fill up easily), and run
> +kmemtrace:
> +# '$' does not mean user, but root here.
> +$ mount -t debugfs none /debug
> +$ mount -t proc none /proc
> +$ cd path/to/kmemtrace-user/
> +$ ./kmemtraced
> +Wait a bit, then stop it with CTRL+C.
> +$ cat /debug/kmemtrace/total_overruns  # Check if we didn't overrun, should
> +                                       # be zero.
> +$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
> +               check its correctness]
> +$ ./kmemtrace-report
> +
> +Now you should have a nice and short summary of how the allocator performs.
> +
> +III. FAQ and known issues
> +=========================
> +Q: 'cat /debug/kmemtrace/total_overruns' is non-zero, how do I fix this?
> +Should I worry?
> +A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
> +large the number is. You can fix it by supplying a higher
> +'kmemtrace.subbufs=N' kernel parameter.
> +---
> +
> +Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
> +A: This is a bug and should be reported. It can occur for a variety of
> +reasons:
> +       - possible bugs in relay code
> +       - possible misuse of relay by kmemtrace
> +       - timestamps being collected unorderly
> +Or you may fix it yourself and send us a patch.
> +---
> +
> +Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
> +A: This is a known issue and I'm working on it. These might be true errors
> +in kernel code, which may have inconsistent behavior (e.g. allocating memory
> +with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
> +out this behavior may work with SLAB, but may fail with other allocators.
> +
> +It may also be due to lack of tracing in some unusual allocator functions.
> +
> +We don't want bug reports regarding this issue yet.
> +---

I think you're supposed to document the actual filesystem in
Documentation/ABI as well.

> +
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 56a2f67..e967bc2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2425,6 +2425,12 @@ M:       jason.wessel@...driver.com
>  L:     kgdb-bugreport@...ts.sourceforge.net
>  S:     Maintained
>
> +KMEMTRACE
> +P:     Eduard - Gabriel Munteanu
> +M:     eduard.munteanu@...ux360.ro
> +L:     linux-kernel@...r.kernel.org
> +S:     Maintained
> +
>  KPROBES
>  P:     Ananth N Mavinakayanahalli
>  M:     ananth@...ibm.com
> diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h
> new file mode 100644
> index 0000000..da69d22
> --- /dev/null
> +++ b/include/linux/kmemtrace.h
> @@ -0,0 +1,110 @@
> +/*
> + * Copyright (C) 2008 Eduard - Gabriel Munteanu
> + *
> + * This file is released under GPL version 2.
> + */
> +
> +#ifndef _LINUX_KMEMTRACE_H
> +#define _LINUX_KMEMTRACE_H
> +
> +#include <linux/types.h>
> +
> +/* ABI definition starts here. */
> +
> +#define KMEMTRACE_ABI_VERSION          1
> +
> +enum kmemtrace_event_id {
> +       KMEMTRACE_EVENT_NULL = 0,       /* Erroneous event. */

I don't think this is used anywhere so why not drop it?

> +       KMEMTRACE_EVENT_ALLOC,
> +       KMEMTRACE_EVENT_FREE,
> +};
> +
> +enum kmemtrace_type_id {
> +       KMEMTRACE_TYPE_KERNEL = 0,      /* kmalloc() / kfree(). */
> +       KMEMTRACE_TYPE_CACHE,           /* kmem_cache_*(). */
> +       KMEMTRACE_TYPE_PAGES,           /* __get_free_pages() and friends. */

I still think kernel vs. cache is confusing because both allocations
*are* for the kernel. So perhaps kmalloc vs. cache?

> +};
> +
> +struct kmemtrace_event {

So why don't we have the ABI version embedded here like blktrace has
so that user-space can check if the format matches its expectations?
That should be future-proof as well: as long as y ou keep the existing
fields where they're at now, you can always add new fields at the end
of the struct.

> +       __u16           event_id;       /* Allocate or free? */
> +       __u16           type_id;        /* Kind of allocation/free. */
> +       __s32           node;           /* Target CPU. */
> +       __u64           call_site;      /* Caller address. */
> +       __u64           ptr;            /* Pointer to allocation. */
> +       __u64           bytes_req;      /* Number of bytes requested. */
> +       __u64           bytes_alloc;    /* Number of bytes allocated. */
> +       __u64           gfp_flags;      /* Requested flags. */
> +       __s64           timestamp;      /* When the operation occured in ns. */
> +} __attribute__ ((__packed__));
> +
> +/* End of ABI definition. */
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/marker.h>
> +
> +#ifdef CONFIG_KMEMTRACE
> +
> +extern void kmemtrace_init(void);
> +
> +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> +                                            unsigned long call_site,
> +                                            const void *ptr,
> +                                            size_t bytes_req,
> +                                            size_t bytes_alloc,
> +                                            unsigned long gfp_flags,
> +                                            int node)
> +{
> +       trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu "
> +                  "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d",
> +                  type_id, call_site, (unsigned long) ptr,
> +                  bytes_req, bytes_alloc, gfp_flags, node);
> +}
> +
> +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> +                                      unsigned long call_site,
> +                                      const void *ptr)
> +{
> +       trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu",
> +                  type_id, call_site, (unsigned long) ptr);
> +}
> +
> +#else /* CONFIG_KMEMTRACE */
> +
> +static inline void kmemtrace_init(void)
> +{
> +}
> +
> +static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> +                                            unsigned long call_site,
> +                                            const void *ptr,
> +                                            size_t bytes_req,
> +                                            size_t bytes_alloc,
> +                                            unsigned long gfp_flags,
> +                                            int node)
> +{
> +}
> +
> +static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> +                                      unsigned long call_site,
> +                                      const void *ptr)
> +{
> +}
> +
> +#endif /* CONFIG_KMEMTRACE */
> +
> +static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
> +                                       unsigned long call_site,
> +                                       const void *ptr,
> +                                       size_t bytes_req,
> +                                       size_t bytes_alloc,
> +                                       unsigned long gfp_flags)
> +{
> +       kmemtrace_mark_alloc_node(type_id, call_site, ptr,
> +                                 bytes_req, bytes_alloc, gfp_flags, -1);
> +}
> +
> +#endif /* __KERNEL__ */
> +
> +#endif /* _LINUX_KMEMTRACE_H */
> +
> diff --git a/init/main.c b/init/main.c
> index 057f364..c00659c 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -66,6 +66,7 @@
>  #include <asm/setup.h>
>  #include <asm/sections.h>
>  #include <asm/cacheflush.h>
> +#include <linux/kmemtrace.h>
>
>  #ifdef CONFIG_X86_LOCAL_APIC
>  #include <asm/smp.h>
> @@ -641,6 +642,7 @@ asmlinkage void __init start_kernel(void)
>        enable_debug_pagealloc();
>        cpu_hotplug_init();
>        kmem_cache_init();
> +       kmemtrace_init();
>        debug_objects_mem_init();
>        idr_init_cache();
>        setup_per_cpu_pageset();
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index d2099f4..6bacab5 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -674,6 +674,10 @@ config FIREWIRE_OHCI_REMOTE_DMA
>
>          If unsure, say N.
>
> +config KMEMTRACE
> +       bool "Kernel memory tracer"
> +       depends on RELAY && DEBUG_FS && MARKERS
> +
>  source "samples/Kconfig"
>
>  source "lib/Kconfig.kgdb"
> diff --git a/mm/Makefile b/mm/Makefile
> index 18c143b..d88a3bc 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -33,4 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o
>  obj-$(CONFIG_SMP) += allocpercpu.o
>  obj-$(CONFIG_QUICKLIST) += quicklist.o
>  obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
> -
> +obj-$(CONFIG_KMEMTRACE) += kmemtrace.o
> diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c
> new file mode 100644
> index 0000000..9258010
> --- /dev/null
> +++ b/mm/kmemtrace.c
> @@ -0,0 +1,208 @@
> +/*
> + * Copyright (C) 2008 Pekka Enberg, Eduard - Gabriel Munteanu
> + *
> + * This file is released under GPL version 2.
> + */
> +
> +#include <linux/string.h>
> +#include <linux/debugfs.h>
> +#include <linux/relay.h>
> +#include <linux/module.h>
> +#include <linux/marker.h>
> +#include <linux/gfp.h>
> +#include <linux/kmemtrace.h>
> +
> +#define KMEMTRACE_SUBBUF_SIZE  (8192 * sizeof(struct kmemtrace_event))
> +#define KMEMTRACE_N_SUBBUFS    20
> +
> +static struct rchan *kmemtrace_chan;
> +static u32 kmemtrace_buf_overruns;
> +static unsigned int kmemtrace_n_subbufs;
> +
> +static inline void kmemtrace_log_event(struct kmemtrace_event *event)
> +{
> +       relay_write(kmemtrace_chan, event, sizeof(struct kmemtrace_event));
> +}
> +
> +static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
> +                                 const char *format, va_list *args)
> +{
> +       unsigned long flags;
> +       struct kmemtrace_event ev;
> +
> +       /*
> +        * Don't convert this to use structure initializers,
> +        * C99 does not guarantee the rvalues evaluation order.
> +        */
> +       ev.event_id = KMEMTRACE_EVENT_ALLOC;
> +       ev.type_id = va_arg(*args, int);
> +       ev.call_site = va_arg(*args, unsigned long);
> +       ev.ptr = va_arg(*args, unsigned long);
> +       /* Don't trace ignored allocations. */
> +       if (!ev.ptr)
> +               return;
> +       ev.bytes_req = va_arg(*args, unsigned long);
> +       ev.bytes_alloc = va_arg(*args, unsigned long);
> +       /* ev.timestamp set below, to preserve event ordering. */
> +       ev.gfp_flags = va_arg(*args, unsigned long);
> +       ev.node = va_arg(*args, int);
> +
> +       local_irq_save(flags);

Why do we disable local irqs here? (Perhaps a comment is in order.)

> +       ev.timestamp = ktime_to_ns(ktime_get());
> +       kmemtrace_log_event(&ev);
> +       local_irq_restore(flags);
> +}
> +
> +static void kmemtrace_probe_free(void *probe_data, void *call_data,
> +                                const char *format, va_list *args)
> +{
> +       unsigned long flags;
> +       struct kmemtrace_event ev;
> +
> +       /*
> +        * Don't convert this to use structure initializers,
> +        * C99 does not guarantee the rvalues evaluation order.
> +        */
> +       ev.event_id = KMEMTRACE_EVENT_FREE;
> +       ev.type_id = va_arg(*args, int);
> +       ev.call_site = va_arg(*args, unsigned long);
> +       ev.ptr = va_arg(*args, unsigned long);
> +       /* Don't trace ignored allocations. */
> +       if (!ev.ptr)
> +               return;
> +       /* ev.timestamp set below, to preserve event ordering. */
> +
> +       local_irq_save(flags);

(same here)

> +       ev.timestamp = ktime_to_ns(ktime_get());
> +       kmemtrace_log_event(&ev);
> +       local_irq_restore(flags);
> +}
> +
> +static struct dentry *
> +kmemtrace_create_buf_file(const char *filename, struct dentry *parent,
> +                         int mode, struct rchan_buf *buf, int *is_global)
> +{
> +       return debugfs_create_file(filename, mode, parent, buf,
> +                                  &relay_file_operations);
> +}
> +
> +static int kmemtrace_remove_buf_file(struct dentry *dentry)
> +{
> +       debugfs_remove(dentry);
> +
> +       return 0;
> +}
> +
> +static int kmemtrace_count_overruns(struct rchan_buf *buf,
> +                                   void *subbuf, void *prev_subbuf,
> +                                   size_t prev_padding)
> +{
> +       if (relay_buf_full(buf)) {
> +               kmemtrace_buf_overruns++;
> +               return 0;
> +       }
> +
> +       return 1;
> +}
> +
> +static struct rchan_callbacks relay_callbacks = {
> +       .create_buf_file = kmemtrace_create_buf_file,
> +       .remove_buf_file = kmemtrace_remove_buf_file,
> +       .subbuf_start = kmemtrace_count_overruns,
> +};
> +
> +static struct dentry *kmemtrace_dir;
> +static struct dentry *kmemtrace_overruns_dentry;
> +
> +static void kmemtrace_cleanup(void)
> +{
> +       relay_close(kmemtrace_chan);
> +       marker_probe_unregister("kmemtrace_alloc",
> +                               kmemtrace_probe_alloc, NULL);
> +       marker_probe_unregister("kmemtrace_free",
> +                               kmemtrace_probe_free, NULL);
> +       if (kmemtrace_overruns_dentry)
> +               debugfs_remove(kmemtrace_overruns_dentry);
> +}
> +
> +static int __init kmemtrace_setup_late(void)
> +{
> +       if (!kmemtrace_chan)
> +               goto failed;
> +
> +       kmemtrace_dir = debugfs_create_dir("kmemtrace", NULL);
> +       if (!kmemtrace_dir)
> +               goto cleanup;
> +
> +       kmemtrace_overruns_dentry =
> +               debugfs_create_u32("total_overruns", S_IRUSR,
> +                                  kmemtrace_dir, &kmemtrace_buf_overruns);
> +       if (!kmemtrace_overruns_dentry)
> +               goto dir_cleanup;
> +
> +       if (relay_late_setup_files(kmemtrace_chan, "cpu", kmemtrace_dir))
> +               goto overrun_cleanup;
> +
> +       printk(KERN_INFO "kmemtrace: fully up.\n");
> +
> +       return 0;
> +
> +overrun_cleanup:
> +       debugfs_remove(kmemtrace_overruns_dentry);
> +       kmemtrace_overruns_dentry = NULL;
> +dir_cleanup:
> +       debugfs_remove(kmemtrace_dir);
> +cleanup:
> +       kmemtrace_cleanup();
> +failed:
> +       return 1;
> +}
> +late_initcall(kmemtrace_setup_late);
> +
> +static int __init kmemtrace_set_subbuf_size(char *str)
> +{
> +       get_option(&str, &kmemtrace_n_subbufs);
> +       return 0;
> +}
> +early_param("kmemtrace.subbufs", kmemtrace_set_subbuf_size);
> +
> +void kmemtrace_init(void)
> +{
> +       int err;
> +
> +       if (!kmemtrace_n_subbufs)
> +               kmemtrace_n_subbufs = KMEMTRACE_N_SUBBUFS;
> +
> +       kmemtrace_chan = relay_open(NULL, NULL, KMEMTRACE_SUBBUF_SIZE,
> +                                   kmemtrace_n_subbufs, &relay_callbacks,
> +                                   NULL);
> +       if (!kmemtrace_chan) {
> +               printk(KERN_INFO "kmemtrace: could not open relay channel\n");
> +               return;
> +       }
> +
> +       err = marker_probe_register("kmemtrace_alloc", "type_id %d "
> +                                   "call_site %lu ptr %lu "
> +                                   "bytes_req %lu bytes_alloc %lu "
> +                                   "gfp_flags %lu node %d",
> +                                   kmemtrace_probe_alloc, NULL);
> +       if (err)
> +               goto probe_alloc_fail;
> +       err = marker_probe_register("kmemtrace_free", "type_id %d "
> +                                   "call_site %lu ptr %lu",
> +                                   kmemtrace_probe_free, NULL);
> +       if (err)
> +               goto probe_free_fail;
> +
> +       printk(KERN_INFO "kmemtrace: early init successful.\n");
> +       return;
> +
> +probe_free_fail:
> +       err = marker_probe_unregister("kmemtrace_alloc",
> +                                     kmemtrace_probe_alloc, NULL);
> +       printk(KERN_INFO "kmemtrace: could not register marker probes!\n");
> +probe_alloc_fail:
> +       relay_close(kmemtrace_chan);
> +       kmemtrace_chan = NULL;
> +}
> +
> --
> 1.5.6.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ