linux-kernel - Re: [RFC] persistent store

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1290470763.3008.252.camel@localhost>
Date:	Mon, 22 Nov 2010 16:06:03 -0800
From:	Jim Keniston <jkenisto@...ux.vnet.ibm.com>
To:	linux-kernel@...r.kernel.org
Subject: Re: [RFC] persistent store

> Here's a patch based on some discussions I had with Thomas
> Gleixner at plumbers conference that implements a generic
> layer for persistent storage usable to pass tens or hundreds
> of kilobytes of data from the dying breath of a crashing
> kernel to its successor.
> 
> The usage model I'm envisioning is that a platform driver
> will register with this code to provide the actual storage.
> I've tried to make this interface general, but I'm working
> from a sample of one (the ACPI ERST code), so if anyone else
> has some persistent store that can't be handled by this code,
> speak up and we can put in the necessary tweaks.

I recently posted a patch set for powerpc to capture the most recent
oops or panic message in NVRAM:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087032.html
It covers a lot of the same ground, and could be adapted to use your
framework.  See below for concerns and suggestions.  I'd also be
interested in feedback about the design decisions I mention below.

On powerpc, the amount of NVRAM available for this may be as little
as 1-2 Kbytes.  The minimal oops report (with essentially no backtrace)
is about 1800 bytes.  See below for implications.

We currently read our NVRAM contents via /dev/nvram and the nvram
command.  NVRAM is divided up into several "partitions" -- only
one of which is used for the oops/panic report -- so the user-space
code needs to know about how the partitions are laid out.  It also
needs to know how much text we actually wrote to the partition, and
whether or not it's compressed.  Since the kernel already knows how
to determine all this, it would probably be more convenient to get
at the oops/panic partition through your /sys interface.

> ...
> 2) "Why do you read in all the data from the device when it
> registers and save it in memory? Couldn't you just get the
> list of records and pick up the data from the device when
> the user reads the file?"
> I don't think this is going to be very much data, just a few hundred
> kilobytes (i.e. less that $0.01 worth of memory, even expensive server
> memory). The memory is freed when the record is erased ... which is
> likely to be soon after boot.

Since the amount of text we capture is so tiny, this is unlikely to be
an issue in my case.

> ...
> 6) "Is this widely useful? How many systems have persistent storage?"
> Although ERST was only added to the ACPI spec earlier this year, it
> merely documents existing functionality required for WHEA (Windows
> Hardware Error Architecture). So most modern server systems should
> have it (my test system has it, and it has a BIOS written in mid 2008).
> Sorry desktops & laptops - no love for you here.
> 

Powerpc p Series does, obviously, and we're looking to exploit it in
just this way.

> ...
> +static void
> +pstore_dump(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,
> + const char *s1, unsigned long l1,
> + const char *s2, unsigned long l2)
> +{
> + unsigned long s1_start, s2_start;
> + unsigned long l1_cpy, l2_cpy;
> + char *dst = pstore_buf + psinfo->header_size;
> +
> + /* Don't dump oopses to persistent store */

Why not?  In our case, we capture every oops and panic report, but keep
only the most recent.  Seems like catching the last oops could be useful
if your system hangs thereafter and can't be made to panic.  I suggest
you pass along the reason (KMSG_DUMP_OOPS or whatever) and let the
callback decide.

You'd have to serialize the oops handling, I guess, in case multiple
CPUs oops simultaneously.  (Gotta fix that in my code.)

> + if (reason == KMSG_DUMP_OOPS)
> + return;
> +
> + l2_cpy = min(l2, psinfo->data_size);
> + l1_cpy = min(l1, psinfo->data_size - l2_cpy);
> +
> + s2_start = l2 - l2_cpy;
> + s1_start = l1 - l1_cpy;
> +
> + memcpy(dst, s1 + s1_start, l1_cpy);
> + memcpy(dst + l1_cpy, s2 + s2_start, l2_cpy);
> +
> + psinfo->writer(PSTORE_DMESG, pstore_buf, l1_cpy + l2_cpy);

This assumes that you always want to capture the last psinfo->data_size
bytes of the printk buffer.  Given the small capacity of our NVRAM
partition, I handle the case where the whole oops report doesn't fit.
In that case, I sacrifice the end of the oops report to capture the
beginning.  Patch #3 in my set is about this.

> ...
> +static int
> +pstore_create_sysfs_entry(struct pstore_entry *new_pstore)
> +{
> ...
> + new_pstore->attr.attr.mode = 0444;

/var/log/messages is typically not readable by everybody.  This
appears to circumvent that.

> ...

Thanks.

Jim Keniston
IBM Linux Technology Center
Beaverton, OR


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/