[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140618001816.GK4841@wotan.suse.de>
Date: Wed, 18 Jun 2014 02:18:16 +0200
From: "Luis R. Rodriguez" <mcgrof@...e.com>
To: Petr Mládek <pmladek@...e.cz>
Cc: "Luis R. Rodriguez" <mcgrof@...not-panic.com>, hpa@...ux.intel.com,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
Michal Hocko <mhocko@...e.cz>, Joe Perches <joe@...ches.com>,
Arun KS <arunks.linux@...il.com>,
Kees Cook <keescook@...omium.org>,
Davidlohr Bueso <davidlohr@...com>,
Chris Metcalf <cmetcalf@...era.com>
Subject: Re: [RFT v5h printk: allow increasing the ring buffer depending on
the number of CPUs
On Tue, Jun 17, 2014 at 04:52:01PM +0200, Petr Mládek wrote:
> On Mon 2014-06-16 17:37:44, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@...e.com>
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ea2d5f6..54632a0c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -842,20 +905,56 @@ static int __init log_buf_len_setup(char *str)
> > }
> > early_param("log_buf_len", log_buf_len_setup);
> >
> > +static unsigned __init compute_cpu_contrib(void)
>
> The function name is slightly misleading. It does not compute the
> extra space but the whole length of the ring buffer. What about using
> default_len_by_cpu_num() or so?
Sure.
> > +{
> > + int cpu_extra;
> > + unsigned extra_cpu_log_size;
> > +
> > + /*
> > + * archs should set up cpu_possible_bits properly with
> > + * set_cpu_possible() after setup_arch() but just in
> > + * case lets ensure this is valid.
> > + */
> > + if (num_possible_cpus() <= 1)
> > + return 0;
> > +
> > + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> > + extra_cpu_log_size = roundup_pow_of_two(cpu_extra +
> > __LOG_BUF_LEN);
>
> Great catch. Well, I am not sure if this is really
> needed. memblock_virt_alloc() is called on another locations with "any" size.
Ok, on the other hand using roundup_pow_of_two() would guarantee both general
alignment and upkeeping the tradition of having a ring buffer as a power of
two. I think the implicit trick here was that alignment is ensured by
becauase __LOG_BUF_LEN would be aligned to the architecture already as its
an architecture specific value, furthermore no value smaller than this would
be allowed and since we're doing power of two's any further extra multiplication
by two should be aligned to the architecture. That should also mean that we
wouldn't have to care for the size of log_buf_add_cpu in consideration for
the archicture alignment as its already done for us.
cpu_extra &= ~(LOG_ALIGN - 1); seems fair to me as well if we want to save
some bytes, it probably does no harm if we bumped to the next power of two
and and we could shared aligning code to make this explicit.
I'll let you make the call as you'd know more than me of the requirements
here I'm sure.
> It might be enough to make sure that the size is aligned to
> LOG_ALIGN. This is how the messages are aligned. I would do:
>
> cpu_extra %= LOG_ALIGN;
cpu_extra &= ~(LOG_ALIGN - 1) also works as I noticed in your follow up.
> Another possibility would be to set the minimal size of
> LOG_CPU_MIN_BUF_SHIFT to "6" or so. I hope that the alignment of the
> "struct printk_log" newer would be bigger than 64 bytes. Well, we
> could add a compile check if we want to be sure.
commit 6ebb017de seems to to have added __alignof__(struct log) but more
than anything to ensure it lets the compiler figure it out. I think
using roundup_pow_of_two() and ensuring its > __LOG_BUF_LEN would
suffice the concern too. In fact we could / should probably repurpose
the implicit alignement check with log_buf_len_setup() to share
code.
> Anyway, I do not have any strong opinion here. I might be too careful
> and the roundup_pow_of_two() is perfectly fine.
I think sharing the alignment strategy can help document how this is done.
> > +
> > + if (cpu_extra <= __LOG_BUF_LEN / 2)
> > + return 0;
> > +
> > + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> > + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> > +
> > + return extra_cpu_log_size;
> > +}
> > +
> >
> > void __init setup_log_buf(int early)
> > {
> > unsigned long flags;
> > char *new_log_buf;
> > int free;
> > -
> > - if (!new_log_buf_len)
> > - return;
> > + enum klog_setup_state new_klog_state;
> >
> > if (early) {
> > + if (!new_log_buf_len)
> > + return;
> > new_log_buf =
> > memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
> > + new_klog_state = KLOG_PARAM;
> > } else {
> > - new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
> > + if (klog_state == KLOG_PARAM)
> > + return;
> > + if (new_log_buf_len)
> > + new_klog_state = KLOG_PARAM;
> > + else {
> > + new_log_buf_len = compute_cpu_contrib();
> > + new_klog_state = KLOG_CPU_EXTRA;
> > + }
> > + if (!new_log_buf_len)
> > + return;
> > + new_log_buf = memblock_virt_alloc(new_log_buf_len,
> > PAGE_SIZE);
>
> We should call memblock_virt_allocc_nopanic() in this else part.
Oops, sorry yes.
> Well, I am not sure if the new klog states make the code really better
> readable. I wonder where we lost the simplicity from v3 of this patch ;-)
The issue came up after I realized that an architecture that uses the early
call will end up on this path twice, the trick that the last developer did for
this situation was to use the kernel parameter as an indicator of whether or
not the parameter was treated already or not. Towards the end of the call it
uses:
new_log_buf_len = 0;
Upon entry on the second call when the kernel parameter was set we bail
as its now 0. The issue I had not addressed was three fold:
* if an architecture had used the early call new_log_buf_len would
be 0 now, and so the so check in place on v3 would not have
worked well, instead you'd have to check for log_buf_len as
Davidlohr had suspected but only for the non-early case, but only
if new_log_buf_len was 0 as an early call could have been issued.
* we want to ensure that for archs that don't use the early call
but did pass the kernel parameter that its respected and the extra
cpu stuff is ignored
* num_possible_cpus() setup_arch() requirement
After realizing the double entry issue, as well as the num_possible_cpus()
requirement to be done later, the above was the cleanest solution I could come
up with without having to repurpose new_log_buf_len and setting it to some
magic value, or adding just a bool. I decided a simpler hacky solution would
would work but considered the lack of documentation on all this nasty enough to
merit an enum and some clarification.
> What about replacing the above changes in kernel/printk/printk.c with
> the following ones:
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6962ed..e00a9600f5fa 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -266,6 +266,7 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -842,12 +843,52 @@ static int __init log_buf_len_setup(char *str)
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static unsigned __init default_len_by_cpu_num(void)
> +{
> + int cpu_extra;
> + unsigned extra_cpu_log_size;
> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid.
> + */
> + if (num_possible_cpus() <= 1)
> + return 0;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> + /* make sure that the buffer is aligned */
> + cpu_extra %= LOG_ALIGN;
> + extra_cpu_log_size = roundup_pow_of_two(cpu_extra + __LOG_BUF_LEN);
>From your other e-mail:
cpu_extra &= ~(LOG_ALIGN - 1);
extra_cpu_log_size = cpu_extra + __LOG_BUF_LEN;
I think ensuring the extra_cpu_log_size is > __LOG_BUF_LEN and using
roundup_pow_of_two() with the assumption archs have made __LOG_BUF_LEN
properly arch aligned should suffice to not have to do the alignment
check above, both should work, its a judgement call thing. I'm fine either
way. Let me know.
> +
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return 0;
We could do this earlier to save ourself the not needed roundup_pow_of_two()
computation on the false case or the addition for extra_cpu_log_size.
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + return extra_cpu_log_size;
> +}
> +
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
>
> + /* nope when already allocated earlier */
> + if (log_buf != __log_buf)
> + return;
That would do it too, all this just makes all the requirements and use
cases only legible to the reader of the callers / use cases. I do think
its cleaner.
> +
> + /*
> + * The default size need to be increased on systems with many CPUs.
> + * It is done only when an exact size is not forced by log_buf_len=n
> + * kernel parameter.
> + */
> + if (!new_log_buf_len)
> + new_log_buf_len = default_len_by_cpu_num();
Lets let my documentation exist on archive :), I'll take this and
modify it slightly with some basic comments and sharing aligment
and documenting a bit more.
> +
> + /* nope when nobody wants to increase the size after all */
> if (!new_log_buf_len)
> return;
>
> --
> 1.8.4
>
>
> I think that it is better readable than the two level if-magic with
> the three new flags. The long description of the three flags looked
> scary in itself ;-)
Yeah, true, do we want to share the alignment code with log_buf_len_setup() ?
How about this then?
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ea2d5f6..da57f6f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -266,6 +266,7 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct printk_log)
#endif
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;
@@ -828,29 +829,68 @@ void log_buf_kexec_setup(void)
/* requested log_buf_len from kernel cmdline */
static unsigned long __initdata new_log_buf_len;
-/* save requested log_buf_len since it's too early to process it */
-static int __init log_buf_len_setup(char *str)
+/*
+ * CONFIG_LOG_BUF_SHIFT would be architecture aligned, anything > than it and
+ * a multiple of two of it upkeeps the alignment.
+ */
+static void __init log_buf_len_align(unsigned size)
{
- unsigned size = memparse(str, &str);
-
if (size)
size = roundup_pow_of_two(size);
if (size > log_buf_len)
new_log_buf_len = size;
+}
+
+/* save requested log_buf_len since it's too early to process it */
+static int __init log_buf_len_setup(char *str)
+{
+ unsigned size = memparse(str, &str);
+
+ log_buf_len_align(size);
return 0;
}
early_param("log_buf_len", log_buf_len_setup);
+static void __init log_buf_add_cpu(void)
+{
+ int cpu_extra;
+
+ /*
+ * archs should set up cpu_possible_bits properly with
+ * set_cpu_possible() after setup_arch() but just in
+ * case lets ensure this is valid. During an early
+ * call before setup_arch()() this will be 1.
+ */
+ if (num_possible_cpus() <= 1)
+ return;
+
+ cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
+
+ /* by default this will only continue through for large > 64 CPUs */
+ if (cpu_extra <= __LOG_BUF_LEN / 2)
+ return;
+
+ pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
+ pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
+
+ log_buf_len_align(cpu_extra + __LOG_BUF_LEN);
+}
+
void __init setup_log_buf(int early)
{
unsigned long flags;
char *new_log_buf;
int free;
- if (!new_log_buf_len)
+ if (log_buf != __log_buf)
return;
+ if (!early && !new_log_buf_len)
+ log_buf_add_cpu();
+
+ if (!new_log_buf_len)
+ return;
if (early) {
new_log_buf =
memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists