lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y0KqhKYvf+aKLY1+@minyard.net>
Date:   Sun, 9 Oct 2022 06:03:32 -0500
From:   Corey Minyard <minyard@....org>
To:     Zhang Yuchen <zhangyuchen.lcr@...edance.com>
Cc:     openipmi-developer@...ts.sourceforge.net,
        linux-kernel@...r.kernel.org, qi.zheng@...ux.dev
Subject: Re: [PATCH v2 1/3] ipmi: fix msg stack when IPMI is disconnected

On Sun, Oct 09, 2022 at 05:18:09PM +0800, Zhang Yuchen wrote:
> If you continue to access and send messages at a high frequency (once
> every 55s) when the IPMI is disconnected, messages will accumulate in
> intf->[hp_]xmit_msg. If it lasts long enough, it takes up a lot of
> memory.

This is queued for 6.2.  Thanks.  I already had the other two patches
queued.

-corey

> 
> The reason is that if IPMI is disconnected, each message will be set to
> IDLE after it returns to HOSED through IDLE->ERROR0->HOSED. The next
> message goes through the same process when it comes in. This process
> needs to wait for IBF_TIMEOUT * (MAX_ERROR_RETRIES + 1) = 55s.
> 
> Each message takes 55S to destroy. This results in a continuous increase
> in memory.
> 
> I find that if I wait 5 seconds after the first message fails, the
> status changes to ERROR0 in smi_timeout(). The next message will return
> the error code IPMI_NOT_IN_MY_STATE_ERR directly without wait.
> 
> This is more in line with our needs.
> 
> So instead of setting each message state to IDLE after it reaches the
> state HOSED, set state to ERROR0.
> 
> After testing, the problem has been solved, no matter how many
> consecutive sends, will not cause continuous memory growth. It also
> returns to normal immediately after the IPMI is restored.
> 
> In addition, the HOSED state should also count as invalid. So the HOSED
> is removed from the invalid judgment in start_kcs_transaction().
> 
> The verification operations are as follows:
> 
> 1. Use BPF to record the ipmi_alloc/free_smi_msg().
> 
>   $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc
>       %p\n",retval);} kprobe:free_recv_msg {printf("free  %p\n",arg0)}'
> 
> 2. Exec `date; time for x in $(seq 1 2); do ipmitool mc info; done`.
> 3. Record the output of `time` and when free all msgs.
> 
> Before:
> 
> `time` takes 120s, This is because `ipmitool mc info` send 4 msgs and
> waits only 15 seconds for each message. Last msg is free after 440s.
> 
>   $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc
>       %p\n",retval);} kprobe:free_recv_msg {printf("free  %p\n",arg0)}'
>   Oct 05 11:40:55 Attaching 2 probes...
>   Oct 05 11:41:12 alloc 0xffff9558a05f0c00
>   Oct 05 11:41:27 alloc 0xffff9558a05f1a00
>   Oct 05 11:41:42 alloc 0xffff9558a05f0000
>   Oct 05 11:41:57 alloc 0xffff9558a05f1400
>   Oct 05 11:42:07 free  0xffff9558a05f0c00
>   Oct 05 11:42:07 alloc 0xffff9558a05f7000
>   Oct 05 11:42:22 alloc 0xffff9558a05f2a00
>   Oct 05 11:42:37 alloc 0xffff9558a05f5a00
>   Oct 05 11:42:52 alloc 0xffff9558a05f3a00
>   Oct 05 11:43:02 free  0xffff9558a05f1a00
>   Oct 05 11:43:57 free  0xffff9558a05f0000
>   Oct 05 11:44:52 free  0xffff9558a05f1400
>   Oct 05 11:45:47 free  0xffff9558a05f7000
>   Oct 05 11:46:42 free  0xffff9558a05f2a00
>   Oct 05 11:47:37 free  0xffff9558a05f5a00
>   Oct 05 11:48:32 free  0xffff9558a05f3a00
> 
>   $ root@...0-pb003-t106-n078:~# date;time for x in $(seq 1 2); do
>   ipmitool mc info; done
> 
>   Wed Oct  5 11:41:12 CST 2022
>   No data available
>   Get Device ID command failed
>   No data available
>   No data available
>   No valid response received
>   Get Device ID command failed: Unspecified error
>   No data available
>   Get Device ID command failed
>   No data available
>   No data available
>   No valid response received
>   No data available
>   Get Device ID command failed
> 
>   real        1m55.052s
>   user        0m0.001s
>   sys        0m0.001s
> 
> After:
> 
> `time` takes 55s, all msgs is returned and free after 55s.
> 
>   $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc
>       %p\n",retval);} kprobe:free_recv_msg {printf("free  %p\n",arg0)}'
> 
>   Oct 07 16:30:35 Attaching 2 probes...
>   Oct 07 16:30:45 alloc 0xffff955943aa9800
>   Oct 07 16:31:00 alloc 0xffff955943aacc00
>   Oct 07 16:31:15 alloc 0xffff955943aa8c00
>   Oct 07 16:31:30 alloc 0xffff955943aaf600
>   Oct 07 16:31:40 free  0xffff955943aa9800
>   Oct 07 16:31:40 free  0xffff955943aacc00
>   Oct 07 16:31:40 free  0xffff955943aa8c00
>   Oct 07 16:31:40 free  0xffff955943aaf600
>   Oct 07 16:31:40 alloc 0xffff9558ec8f7e00
>   Oct 07 16:31:40 free  0xffff9558ec8f7e00
>   Oct 07 16:31:40 alloc 0xffff9558ec8f7800
>   Oct 07 16:31:40 free  0xffff9558ec8f7800
>   Oct 07 16:31:40 alloc 0xffff9558ec8f7e00
>   Oct 07 16:31:40 free  0xffff9558ec8f7e00
>   Oct 07 16:31:40 alloc 0xffff9558ec8f7800
>   Oct 07 16:31:40 free  0xffff9558ec8f7800
> 
>   root@...0-pb003-t106-n078:~# date;time for x in $(seq 1 2); do
>   ipmitool mc info; done
>   Fri Oct  7 16:30:45 CST 2022
>   No data available
>   Get Device ID command failed
>   No data available
>   No data available
>   No valid response received
>   Get Device ID command failed: Unspecified error
>   Get Device ID command failed: 0xd5 Command not supported in present state
>   Get Device ID command failed: Command not supported in present state
> 
>   real        0m55.038s
>   user        0m0.001s
>   sys        0m0.001s
> 
> Signed-off-by: Zhang Yuchen <zhangyuchen.lcr@...edance.com>
> ---
>  drivers/char/ipmi/ipmi_kcs_sm.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_kcs_sm.c b/drivers/char/ipmi/ipmi_kcs_sm.c
> index efda90dcf5b3..ecfcb50302f6 100644
> --- a/drivers/char/ipmi/ipmi_kcs_sm.c
> +++ b/drivers/char/ipmi/ipmi_kcs_sm.c
> @@ -122,10 +122,10 @@ struct si_sm_data {
>  	unsigned long  error0_timeout;
>  };
>  
> -static unsigned int init_kcs_data(struct si_sm_data *kcs,
> -				  struct si_sm_io *io)
> +static unsigned int init_kcs_data_with_state(struct si_sm_data *kcs,
> +				  struct si_sm_io *io, enum kcs_states state)
>  {
> -	kcs->state = KCS_IDLE;
> +	kcs->state = state;
>  	kcs->io = io;
>  	kcs->write_pos = 0;
>  	kcs->write_count = 0;
> @@ -140,6 +140,12 @@ static unsigned int init_kcs_data(struct si_sm_data *kcs,
>  	return 2;
>  }
>  
> +static unsigned int init_kcs_data(struct si_sm_data *kcs,
> +				  struct si_sm_io *io)
> +{
> +	return init_kcs_data_with_state(kcs, io, KCS_IDLE);
> +}
> +
>  static inline unsigned char read_status(struct si_sm_data *kcs)
>  {
>  	return kcs->io->inputb(kcs->io, 1);
> @@ -270,7 +276,7 @@ static int start_kcs_transaction(struct si_sm_data *kcs, unsigned char *data,
>  	if (size > MAX_KCS_WRITE_SIZE)
>  		return IPMI_REQ_LEN_EXCEEDED_ERR;
>  
> -	if ((kcs->state != KCS_IDLE) && (kcs->state != KCS_HOSED)) {
> +	if (kcs->state != KCS_IDLE) {
>  		dev_warn(kcs->io->dev, "KCS in invalid state %d\n", kcs->state);
>  		return IPMI_NOT_IN_MY_STATE_ERR;
>  	}
> @@ -495,7 +501,7 @@ static enum si_sm_result kcs_event(struct si_sm_data *kcs, long time)
>  	}
>  
>  	if (kcs->state == KCS_HOSED) {
> -		init_kcs_data(kcs, kcs->io);
> +		init_kcs_data_with_state(kcs, kcs->io, KCS_ERROR0);
>  		return SI_SM_HOSED;
>  	}
>  
> -- 
> 2.30.2
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ