lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z07SnoKSwHQ_y2j2@pathway.suse.cz>
Date: Tue, 3 Dec 2024 10:42:54 +0100
From: Petr Mladek <pmladek@...e.com>
To: Geert Uytterhoeven <geert@...ux-m68k.org>
Cc: "Isaac J. Manjarres" <isaacmanjarres@...gle.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	John Ogness <john.ogness@...utronix.de>,
	Sergey Senozhatsky <senozhatsky@...omium.org>, surenb@...gle.com,
	kernel-team@...roid.com, linux-kernel@...r.kernel.org,
	Linux Embedded <linux-embedded@...r.kernel.org>
Subject: Re: [PATCH v2] printk: Improve memory usage logging during boot

On Sun 2024-12-01 12:40:13, Geert Uytterhoeven wrote:
> Hi Petr et al,
> 
> CC linux-embedded
> 
> On Tue, Oct 1, 2024 at 5:55 PM Petr Mladek <pmladek@...e.com> wrote:
> > On Mon 2024-09-30 11:48:24, Isaac J. Manjarres wrote:
> > > When the initial printk ring buffer size is updated, setup_log_buf()
> > > allocates a new ring buffer, as well as a set of meta-data structures
> > > for the new ring buffer. The function also emits the new size of the
> > > ring buffer, but not the size of the meta-data structures.
> > >
> > > This makes it difficult to assess how changing the log buffer size
> > > impacts memory usage during boot.
> > >
> > > For instance, increasing the ring buffer size from 512 KB to 1 MB
> > > through the command line yields an increase of 2304 KB in reserved
> > > memory at boot, while the only obvious change is the 512 KB
> > > difference in the ring buffer sizes:
> > >
> > > log_buf_len=512K:
> > >
> > > printk: log_buf_len: 524288 bytes
> > > Memory: ... (... 733252K reserved ...)
> > >
> > > log_buf_len=1M:
> > >
> > > printk: log_buf_len: 1048576 bytes
> > > Memory: ... (... 735556K reserved ...)
> > >
> > > This is because of how the size of the meta-data structures scale with
> > > the size of the ring buffer.
> > >
> > > Even when there aren't changes to the printk ring buffer size (i.e. the
> > > initial size ==  1 << CONFIG_LOG_BUF_SHIFT), it is impossible to tell
> > > how much memory is consumed by the printk ring buffer during boot.
> > >
> > > Therefore, unconditionally log the sizes of the printk ring buffer
> > > and its meta-data structures, so that it's easier to understand
> > > how changing the log buffer size (either through the command line or
> > > by changing CONFIG_LOG_BUF_SHIFT) affects boot time memory usage.
> > >
> > > With the new logs, it is much easier to see exactly why the memory
> > > increased by 2304 KB:
> > >
> > > log_buf_len=512K:
> >
> > printk: log buffer data + meta data: 524288 + 1835008 = 2359296 bytes
> >
> > > log_buf_len=1M:
> >
> > printk: log buffer data + meta data: 1048576 + 3670016 = 4718592 bytes
> 
> Thanks, this is very useful information!
> 
> Isn't this kernel log message bookkeeping becoming a bit too excessive?
> E.g. on a small system:
> 
>     printk: log buffer data + meta data: 65536 + 204800 = 270336 bytes
> 
> Why is the meta data that big (> 3*log buffer)?
> 
>     #define PRB_AVGBITS 5    /* 32 character average length */
> 
>     unsigned int descs_count = log_buf_len >> PRB_AVGBITS;
>     meta_data_size = descs_count * (sizeof(struct prb_desc) +
> sizeof(struct printk_info));
> 
>     struct prb_data_blk_lpos {
>             unsigned long   begin;
>             unsigned long   next;
>     };
> 
>     struct prb_desc {
>             atomic_long_t                   state_var;
>             struct prb_data_blk_lpos        text_blk_lpos;
>     };

I am afraid that we could not do much about the size of this part.
All the variables are important parts of the lockless machinery.

> i.e. 12 bytes on 32-bit, 24 bytes on 64-bit.
> 
>     #define PRINTK_INFO_SUBSYSTEM_LEN       16
>     #define PRINTK_INFO_DEVICE_LEN          48
> 
>     struct dev_printk_info {
>             char subsystem[PRINTK_INFO_SUBSYSTEM_LEN];
>             char device[PRINTK_INFO_DEVICE_LEN];
>     };

This is probably the lowest hanging fruit. It should be possible
to write these dev_printk-specific metadata into the data buffer
a more efficient way and only for records created by dev_printk().

It would require adding "dict_len" into "struct printk_info"
and reserving space for both "text_len" and "dict_len".

We bundled it into the metadata because these are metadata.
We wanted to keep the design as clean as possible. We focused
mainly on the stability and maintainability of the code.
And it was really challenging to get it all working.

>     struct printk_info {
>             u64     seq;            /* sequence number */

I do not recal the details. But I think that we need to
explicitely store the 64-bit "seq" number in the metadata
because of the lockless algoritm. It helps to solve
problems with wrapping of the counter in
"atomic_long_t state_var".

It was not stored before. The following global values were
enough when the log buffer was synchronized by "logbuf_lock"
spin lock:

	static u64 log_first_seq;
	static u64 log_next_seq;

>             u64     ts_nsec;        /* timestamp in nanoseconds */
>             u16     text_len;       /* length of text message */
>             u8      facility;       /* syslog facility */
>             u8      flags:5;        /* internal record flags */
>             u8      level:3;        /* syslog level */
>             u32     caller_id;      /* thread id or processor id */

These metadata used to be stored in the "data" buffer next to the
message. Here is the declaration from v6.9:

	struct printk_log {
		u64 ts_nsec;		/* timestamp in nanoseconds */
		u16 len;		/* length of entire record */
		u16 text_len;		/* length of text buffer */
		u16 dict_len;		/* length of dictionary buffer */
		u8 facility;		/* syslog facility */
		u8 flags:5;		/* internal record flags */
		u8 level:3;		/* syslog level */
	#ifdef CONFIG_PRINTK_CALLER
		u32 caller_id;            /* thread id or processor id */
	#endif
	}

>             struct dev_printk_info  dev_info;

As I wrote above. It should be possible to store these metadata more
effectively in the data buffer.

But note that it is only about the internal kernel code. These change
would require also updating external tools, for example "crash" tool.

>     };
>
> for a whopping 88 bytes.  So that totals to 100 bytes per entry
> on 32-bit, and 112 on 64-bit, for an average of 32 characters per
> printed message...

It would be interesting to know how much if these are wasted because
either struct dev_printk_info is empty or the entries are shorter.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ