lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d7f37c2-caa5-92fd-4630-c40e2ff13a37@themaw.net>
Date:   Fri, 3 Nov 2017 20:45:02 +0800
From:   Ian Kent <raven@...maw.net>
To:     NeilBrown <neilb@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     lkml <linux-kernel@...r.kernel.org>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] autofs: don't fail mount for transient error

On 03/11/17 09:40, NeilBrown wrote:
> 

Hi Neil, and thanks taking the time to post the patch.

> Currently if the autofs kernel module gets an error when
> writing to the pipe which links to the daemon, then it
> marks the whole moutpoint as catatonic, and it will stop working.
> 
> It is possible that the error is transient.  This can happen
> if the daemon is slow and more than 16 requests queue up.
> If a subsequent process tries to queue a request, and is then signalled,
> the write to the pipe will return -ERESTARTSYS and autofs
> will take that as total failure.

Indeed it does.

And given the problems with a half dozen (or so) user space
applications consuming large amounts of CPU under heavy mount
and umount activity this could happen more easily than we
expect.

> 
> So change the code to assess -ERESTARTSYS and -ENOMEM as transient
> failures which only abort the current request, not the whole
> mountpoint.

This looks good to me.

> 
> Signed-off-by: NeilBrown <neilb@...e.com>
> ---
> 
> Do people think this should got to -stable ??
> It isn't a crash or a data corruption, but having autofs mountpoints
> suddenly stop working is rather inconvenient.

Perhaps that's a good idea given the CPU usage problem I refer
to above has been around for a while now.

> 
> Thanks,
> NeilBrown
> 
> 
>  fs/autofs4/waitq.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 4ac49d038bf3..8fc41705c7cd 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
>  		spin_unlock_irqrestore(&current->sighand->siglock, flags);
>  	}
>  
> -	return (bytes > 0);
> +	/* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
> +	return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
>  }
>  
>  static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> @@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>  	} pkt;
>  	struct file *pipe = NULL;
>  	size_t pktsz;
> +	int ret;
>  
>  	pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
>  		 (unsigned long) wq->wait_queue_token,
> @@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>  	mutex_unlock(&sbi->wq_mutex);
>  
>  	if (autofs4_write(sbi, pipe, &pkt, pktsz))
> +	switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
> +	case 0:
> +		break;
> +	case -ENOMEM:
> +	case -ERESTARTSYS:
> +		/* Just fail this one */
> +		autofs4_wait_release(sbi, wq->wait_queue_token, ret);
> +		break;
> +	default:
>  		autofs4_catatonic_mode(sbi);
> +		break;
> +	}
>  	fput(pipe);
>  }
>  
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ