linux-kernel - Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1491570740.2745.12.camel@redhat.com>
Date:   Fri, 07 Apr 2017 09:12:20 -0400
From:   Jeff Layton <jlayton@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     NeilBrown <neilb@...e.com>, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
        akpm@...ux-foundation.org, tytso@....edu, jack@...e.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking
 infrastructure and convert ext4 to use it

On Thu, 2017-04-06 at 13:05 -0700, Matthew Wilcox wrote:
> On Thu, Apr 06, 2017 at 03:14:52PM -0400, Jeff Layton wrote:
> > @@ -868,6 +869,7 @@ struct file {
> >  	struct list_head	f_tfile_llink;
> >  #endif /* #ifdef CONFIG_EPOLL */
> >  	struct address_space	*f_mapping;
> > +	u32			f_wb_err;
> >  } __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
> >  
> 
> I think we can squeeze that in next to f_flags?
> 

Sure, will do. I meant to look at pahole output and see if there are
existing holes.

> > +/**
> > + * filemap_set_wb_error - set the wb error in the mapping for later reporting
> > + * @mapping: mapping in which the error should be set
> > + * @err: error to set. must be negative value but not less than -MAX_ERRNO
> 
> Do we want to have users call filemap_set_wb_error(mapping, EIO)
> or filemap_set_wb_error(mapping, -EIO)?  Either way, we can assert
> that it's in the correct range (oh look, we have at least one user of
> mapping_set_error calling it with a positive errno ...)
> 

Yeah, I sent a patch for that a while back but I don't think anyone
picked it up. Luckily that caller is harmless since EIO just ends up in
the default case and gets turned into -EIO.

> I've been playing with positive or negative errnos for the xarray, and
> positive looks better to me, although there's a definite advantage to
> being able to just call filemap_set_wb_error(mapping, result).
> 

That's my main rationale. We generally use negative error codes in the
kernel, so let's do what's easiest for most callsites. I say negative
error codes here.


> #define XAS_ERROR(errno)        ((struct xa_node *)((errno << 1) | 1))
> 
> static inline int xas_error(const struct xa_state *xas)
> {
>         unsigned long v = (unsigned long)xas->xa_node;
>         return (v & 1) ? -(v >> 1) : 0;
> }
> 
> static inline void xas_set_err(struct xa_state *xas, unsigned long err)
> {
>         XA_BUG_ON(err > MAX_ERRNO);
>         xas->xa_node = XAS_ERROR(err);
> }
> 
> > +	/*
> > +	 * Ensure the error code actually fits where we want it to go. If it
> > +	 * doesn't then just throw a warning and don't record anything.
> > +	 */
> > +	if (unlikely(err > 0 || err < -MAX_ERRNO)) {
> > +		WARN(1, "err=%d\n", err);
> > +		return;
> > +	}
> 
> Cute trick to make this more succinct:
> 
> 	if (WARN(err > 0 || err < -MAX_ERRNO), "err = %d\n", err)
> 		return;
> or even ...
> 
> 	if (WARN((unsigned int)-err > MAX_ERRNO), "err = %d\n", err)
> 		return;
> 

Nice. I always forget that WARN has a return. Will fix.

> > +		/* Clear out error bits and set new error */
> > +		new = (old & ~MAX_ERRNO) | -err;
> > +
> > +		/* Only increment if someone has looked at it */
> > +		if (old & WB_ERR_SEEN) {
> > +			new += WB_ERR_CTR_INC;
> > +			new &= ~WB_ERR_SEEN;
> > +		}
> 
> Although we always want to clear out the SEEN bit if we're updating ... so
> 
> 		new = (old & ~(MAX_ERRNO | WB_ERR_SEEN) | -err;
> 
> 		/* Only increment if someone has looked at it */
> 		if (old & WB_ERR_SEEN)
> 			new += WB_ERR_CTR_INC;
> 

Sure, that is more succinct.

> ... and then there's no need to update if it's the same errno and nobody's
> seen it:
> 
> 		if (old == new)
> 			break;
> 

No, we can't do this. The thing could have just been updated by a task
that is setting the "seen" bit. We don't want to lose the error here. We
always have to do the cmpxchg on the set_wb_error side, I think.

> [...]
> 
> > +		/*
> > +		 * We always store values with the "seen" bit set, so if this
> > +		 * matches what we already have, then we can call it done.
> > +		 * There is nothing to update so just return 0.
> > +		 */
> > +		if (old == file->f_wb_err)
> > +			break;
> > +
> > +		/* set flag and try to swap it into place */
> > +		new = old | WB_ERR_SEEN;
> 
> Again, I think we should avoid the cmpxchg with:
> 
> 		if (old == new)
> 			break;
> 

Yeah, we may be able to do this one. I had myself convinced otherwise
yesterday, but I think you may be right.

> > +		cur = cmpxchg(&mapping->wb_err, old, new);
> > +
> > +		/*
> > +		 * We can quit now if we successfully swapped in the new value
> > +		 * or someone else beat us to it with the same value that we
> > +		 * were planning to store.
> > +		 */
> > +		if (likely(cur == old || cur == new)) {
> > +			file->f_wb_err = new;
> > +			err = -(new & MAX_ERRNO);
> > +			break;
> > +		}
> > +
> > +		/* Raced with an update, try again */
> > +		old = cur;
> 
> Well ... should we?  We're returning an error which is new to this fd anyway.
> Do we want to return the most recent error by a nanosecond, or should we
> return the previous one and then see this one next time we call fsync()?
> 
> I'd lean towards not looping here; not even looking at 'cur'.
> 

Yeah, that might be fine here. Let me think about it a bit more.

-- 
Jeff Layton <jlayton@...hat.com>