linux-kernel - RE: [PATCH v17 04/10] fs/ntfs3: Add file operations and implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf76ecec5ec1419eacf4c170df65a57d@paragon-software.com>
Date:   Mon, 18 Jan 2021 10:00:53 +0000
From:   Konstantin Komarov <almaz.alexandrovich@...agon-software.com>
To:     Kari Argillander <kari.argillander@...il.com>
CC:     "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "pali@...nel.org" <pali@...nel.org>,
        "dsterba@...e.cz" <dsterba@...e.cz>,
        "aaptel@...e.com" <aaptel@...e.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "rdunlap@...radead.org" <rdunlap@...radead.org>,
        "joe@...ches.com" <joe@...ches.com>,
        "mark@...mstone.com" <mark@...mstone.com>,
        "nborisov@...e.com" <nborisov@...e.com>,
        "linux-ntfs-dev@...ts.sourceforge.net" 
        <linux-ntfs-dev@...ts.sourceforge.net>,
        "anton@...era.com" <anton@...era.com>,
        "dan.carpenter@...cle.com" <dan.carpenter@...cle.com>,
        "hch@....de" <hch@....de>,
        "ebiggers@...nel.org" <ebiggers@...nel.org>,
        "andy.lavr@...il.com" <andy.lavr@...il.com>
Subject: RE: [PATCH v17 04/10] fs/ntfs3: Add file operations and
 implementation

From: Kari Argillander <kari.argillander@...il.com>
Sent: Monday, January 4, 2021 12:58 AM
> To: Konstantin Komarov <almaz.alexandrovich@...agon-software.com>
> Cc: linux-fsdevel@...r.kernel.org; viro@...iv.linux.org.uk; linux-kernel@...r.kernel.org; pali@...nel.org; dsterba@...e.cz;
> aaptel@...e.com; willy@...radead.org; rdunlap@...radead.org; joe@...ches.com; mark@...mstone.com; nborisov@...e.com;
> linux-ntfs-dev@...ts.sourceforge.net; anton@...era.com; dan.carpenter@...cle.com; hch@....de; ebiggers@...nel.org;
> andy.lavr@...il.com
> Subject: Re: [PATCH v17 04/10] fs/ntfs3: Add file operations and implementation
> 
> On Thu, Dec 31, 2020 at 06:23:55PM +0300, Konstantin Komarov wrote:
> > This adds file operations and implementation
> >
> > Signed-off-by: Konstantin Komarov <almaz.alexandrovich@...agon-software.com>
> > ---
> >  fs/ntfs3/dir.c     |  570 ++++++++
> >  fs/ntfs3/file.c    | 1140 ++++++++++++++++
> >  fs/ntfs3/frecord.c | 3088 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/ntfs3/namei.c   |  590 +++++++++
> >  fs/ntfs3/record.c  |  614 +++++++++
> >  fs/ntfs3/run.c     | 1254 ++++++++++++++++++
> >  6 files changed, 7256 insertions(+)
> >  create mode 100644 fs/ntfs3/dir.c
> >  create mode 100644 fs/ntfs3/file.c
> >  create mode 100644 fs/ntfs3/frecord.c
> >  create mode 100644 fs/ntfs3/namei.c
> >  create mode 100644 fs/ntfs3/record.c
> >  create mode 100644 fs/ntfs3/run.c
> >
> > diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
> 
> > +int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
> > +{
> > +	return generic_file_fsync(filp, start, end, datasync);
> > +}
> 
> Do we have a reson why we implement this if we just use generic. Isn't
> it more clear if we use generic fsync straight away?
> 
Hi! Indeed. Migration to the generic fsync will be introduced in v18.

> > +static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
> > +{
> 
> > +	/* Return error if mode is not supported */
> > +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> > +		     FALLOC_FL_COLLAPSE_RANGE))
> > +		return -EOPNOTSUPP;
> 
> > +
> > +	if (mode & FALLOC_FL_PUNCH_HOLE) {
> 
> > +	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
> 
> > +	} else {
> > +		/*
> > +		 * normal file: allocate clusters, do not change 'valid' size
> > +		 */
> > +		err = ntfs_set_size(inode, max(end, i_size));
> > +		if (err)
> > +			goto out;
> > +
> > +		if (is_sparsed(ni) || is_compressed(ni)) {
> > +			CLST vcn_v = ni->i_valid >> sbi->cluster_bits;
> > +			CLST vcn = vbo >> sbi->cluster_bits;
> > +			CLST cend = bytes_to_cluster(sbi, end);
> > +			CLST lcn, clen;
> > +			bool new;
> > +
> > +			/*
> > +			 * allocate but not zero new clusters (see below comments)
> > +			 * this breaks security (one can read unused on-disk areas)
> > +			 * zeroing these clusters may be too long
> > +			 * may be we should check here for root rights?
> > +			 */
> > +			for (; vcn < cend; vcn += clen) {
> > +				err = attr_data_get_block(ni, vcn, cend - vcn,
> > +							  &lcn, &clen, &new);
> > +				if (err)
> > +					goto out;
> > +				if (!new || vcn >= vcn_v)
> > +					continue;
> > +
> > +				/*
> > +				 * Unwritten area
> > +				 * NTFS is not able to store several unwritten areas
> > +				 * Activate 'ntfs_sparse_cluster' to zero new allocated clusters
> > +				 *
> > +				 * Dangerous in case:
> > +				 * 1G of sparsed clusters + 1 cluster of data =>
> > +				 * valid_size == 1G + 1 cluster
> > +				 * fallocate(1G) will zero 1G and this can be very long
> > +				 * xfstest 016/086 will fail whithout 'ntfs_sparse_cluster'
> > +				 */
> > +				/*ntfs_sparse_cluster(inode, NULL, vcn,
> > +				 *		    min(vcn_v - vcn, clen));
> > +				 */
> > +			}
> > +		}
> > +
> > +		if (mode & FALLOC_FL_KEEP_SIZE) {
> 
> Isn't this hole else already (mode & FALLOC_FL_KEEP_SIZE?

Sorry, can you please clarify your question? Not sure, understood it.
> 
> > +			ni_lock(ni);
> > +			/*true - keep preallocated*/
> > +			err = attr_set_size(ni, ATTR_DATA, NULL, 0,
> > +					    &ni->file.run, i_size, &ni->i_valid,
> > +					    true, NULL);
> > +			ni_unlock(ni);
> > +		}
> > +	}
> > +
> > +	if (!err) {
> > +		inode->i_ctime = inode->i_mtime = current_time(inode);
> > +		mark_inode_dirty(inode);
> > +	}
> > +out:
> > +	if (err == -EFBIG)
> > +		err = -ENOSPC;
> > +
> > +	inode_unlock(inode);
> > +	return err;
> > +}
> 
> > diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
> 
> > +int mi_get(struct ntfs_sb_info *sbi, CLST rno, struct mft_inode **mi)
> > +{
> > +	int err;
> > +	struct mft_inode *m = ntfs_alloc(sizeof(struct mft_inode), 1);
> > +
> > +	if (!m)
> > +		return -ENOMEM;
> > +
> > +	err = mi_init(m, sbi, rno);
> 
> If error happend should we just free end exit. Now we call mi_put() to
> clean up.
> 

Done, will be fixed in v18.

> > +	if (!err)
> > +		err = mi_read(m, false);
> > +
> > +	if (err) {
> > +		mi_put(m);
> > +		return err;
> > +	}
> > +
> > +	*mi = m;
> > +	return 0;
> > +}
> 
> > +struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr)
> > +{
> > +	const struct MFT_REC *rec = mi->mrec;
> > +	u32 used = le32_to_cpu(rec->used);
> > +	u32 t32, off, asize;
> > +	u16 t16;
> > +
> > +	if (!attr) {
> > +		u32 total = le32_to_cpu(rec->total);
> > +
> > +		off = le16_to_cpu(rec->attr_off);
> > +
> > +		if (used > total)
> > +			goto out;
> > +
> > +		if (off >= used || off < MFTRECORD_FIXUP_OFFSET_1 ||
> > +		    !IsDwordAligned(off)) {
> > +			goto out;
> > +		}
> > +
> > +		/* Skip non-resident records */
> > +		if (!is_rec_inuse(rec))
> > +			goto out;
> > +
> > +		attr = Add2Ptr(rec, off);
> > +	} else {
> > +		/* Check if input attr inside record */
> > +		off = PtrOffset(rec, attr);
> > +		if (off >= used)
> > +			goto out;
> > +
> > +		asize = le32_to_cpu(attr->size);
> > +		if (asize < SIZEOF_RESIDENT)
> > +			goto out;
> > +
> > +		attr = Add2Ptr(attr, asize);
> > +		off += asize;
> > +	}
> > +
> > +	asize = le32_to_cpu(attr->size);
> > +
> > +	/* Can we use the first field (attr->type) */
> > +	if (off + 8 > used) {
> > +		static_assert(QuadAlign(sizeof(enum ATTR_TYPE)) == 8);
> > +		goto out;
> > +	}
> > +
> > +	if (attr->type == ATTR_END) {
> > +		if (used != off + 8)
> > +			goto out;
> 
> This if is not needed if there is return NULL after. But return
> NULL might also be bug.
> 

Thanks, will also be fixed in v18.

> > +		return NULL;
> > +	}
> > +
> > +	t32 = le32_to_cpu(attr->type);
> > +	if ((t32 & 0xf) || (t32 > 0x100))
> > +		goto out;
> > +
> > +	/* Check boundary */
> > +	if (off + asize > used)
> > +		goto out;
> > +
> > +	/* Check size of attribute */
> > +	if (!attr->non_res) {
> > +		if (asize < SIZEOF_RESIDENT)
> > +			goto out;
> > +
> > +		t16 = le16_to_cpu(attr->res.data_off);
> > +
> > +		if (t16 > asize)
> > +			goto out;
> > +
> > +		t32 = le32_to_cpu(attr->res.data_size);
> > +		if (t16 + t32 > asize)
> > +			goto out;
> > +
> > +		return attr;
> > +	}
> > +
> > +	/* Check some nonresident fields */
> > +	if (attr->name_len &&
> > +	    le16_to_cpu(attr->name_off) + sizeof(short) * attr->name_len >
> > +		    le16_to_cpu(attr->nres.run_off)) {
> > +		goto out;
> > +	}
> > +
> > +	if (attr->nres.svcn || !is_attr_ext(attr)) {
> > +		if (asize + 8 < SIZEOF_NONRESIDENT)
> > +			goto out;
> > +
> > +		if (attr->nres.c_unit)
> > +			goto out;
> > +	} else if (asize + 8 < SIZEOF_NONRESIDENT_EX)
> > +		goto out;
> > +
> > +	return attr;
> > +
> > +out:
> > +	return NULL;
> > +}
> 
> > diff --git a/fs/ntfs3/run.c b/fs/ntfs3/run.c
> 
> > +static inline int run_packed_size(const s64 *n)
> > +{
> > +#ifdef __BIG_ENDIAN
> 
> These are whole functions with ifdef. It would be maybe more clear
> that there really is whole functions to both endiand.
> 

This is controversial question, but fixed as well (in v18).

> > +	const u8 *p = (const u8 *)n + sizeof(*n) - 1;
> > +
> > +	if (*n >= 0) {
> > +		if (p[-7] || p[-6] || p[-5] || p[-4])
> > +			p -= 4;
> > +		if (p[-3] || p[-2])
> > +			p -= 2;
> > +		if (p[-1])
> > +			p -= 1;
> > +		if (p[0] & 0x80)
> > +			p -= 1;
> > +	} else {
> > +		if (p[-7] != 0xff || p[-6] != 0xff || p[-5] != 0xff ||
> > +		    p[-4] != 0xff)
> > +			p -= 4;
> > +		if (p[-3] != 0xff || p[-2] != 0xff)
> > +			p -= 2;
> > +		if (p[-1] != 0xff)
> > +			p -= 1;
> > +		if (!(p[0] & 0x80))
> > +			p -= 1;
> > +	}
> > +	return (const u8 *)n + sizeof(*n) - p;
> 
> }
> #else
> static inline int run_packed_size(const s64 *n)
> {
> 
> Something like this.
> 
> > +	const u8 *p = (const u8 *)n;
> > +
> > +	if (*n >= 0) {
> > +		if (p[7] || p[6] || p[5] || p[4])
> > +			p += 4;
> > +		if (p[3] || p[2])
> > +			p += 2;
> > +		if (p[1])
> > +			p += 1;
> > +		if (p[0] & 0x80)
> > +			p += 1;
> > +	} else {
> > +		if (p[7] != 0xff || p[6] != 0xff || p[5] != 0xff ||
> > +		    p[4] != 0xff)
> > +			p += 4;
> > +		if (p[3] != 0xff || p[2] != 0xff)
> > +			p += 2;
> > +		if (p[1] != 0xff)
> > +			p += 1;
> > +		if (!(p[0] & 0x80))
> > +			p += 1;
> > +	}
> > +
> > +	return 1 + p - (const u8 *)n;
> > +#endif
> > +}
> > +
> > +/*
> > + * run_pack
> > + *
> > + * packs runs into buffer
> > + * packed_vcns - how much runs we have packed
> > + * packed_size - how much bytes we have used run_buf
> > + */
> > +int run_pack(const struct runs_tree *run, CLST svcn, CLST len, u8 *run_buf,
> > +	     u32 run_buf_size, CLST *packed_vcns)
> > +{
> > +	CLST next_vcn, vcn, lcn;
> > +	CLST prev_lcn = 0;
> > +	CLST evcn1 = svcn + len;
> > +	int packed_size = 0;
> > +	size_t i;
> > +	bool ok;
> > +	s64 dlcn, len64;
> > +	int offset_size, size_size, t;
> > +	const u8 *p;
> > +
> > +	next_vcn = vcn = svcn;
> > +
> > +	*packed_vcns = 0;
> > +
> > +	if (!len)
> > +		goto out;
> > +
> > +	ok = run_lookup_entry(run, vcn, &lcn, &len, &i);
> > +
> > +	if (!ok)
> > +		goto error;
> > +
> > +	if (next_vcn != vcn)
> > +		goto error;
> > +
> > +	for (;;) {
> > +		/* offset of current fragment relatively to previous fragment */
> > +		dlcn = 0;
> 
> This dlcn
> 
> > +		next_vcn = vcn + len;
> > +
> > +		if (next_vcn > evcn1)
> > +			len = evcn1 - vcn;
> > +
> > +		/*
> > +		 * mirror of len, but signed, because run_packed_size()
> > +		 * works with signed int only
> > +		 */
> > +		len64 = len;
> > +
> > +		/* how much bytes is packed len64 */
> > +		size_size = run_packed_size(&len64);
> 
> Does (s64 *)&len work just fine?
> 

In addition to Mattew's response:
NTFS by it's spec supports 64bit clusters, however ntfs.sys driver
uses 32bit clusters only. This is a default for ntfs3 as well, which may
be configured to 64bit as well. I.e. len may be both.

> > +
> > +		/* offset_size - how much bytes is packed dlcn */
> > +		if (lcn == SPARSE_LCN) {
> > +			offset_size = 0;
> 
> dlcn might be better to live here?
> 

You are right, done.

> > +		} else {
> > +			/* NOTE: lcn can be less than prev_lcn! */
> > +			dlcn = (s64)lcn - prev_lcn;
> > +			offset_size = run_packed_size(&dlcn);
> > +			prev_lcn = lcn;
> > +		}
> > +	
> > +		t = run_buf_size - packed_size - 2 - offset_size;
> > +		if (t <= 0)
> > +			goto out;
> > +
> > +		/* can we store this entire run */
> > +		if (t < size_size)
> > +			goto out;
> > +
> > +		if (run_buf) {
> > +			p = (u8 *)&len64;
> > +
> > +			/* pack run header */
> > +			run_buf[0] = ((u8)(size_size | (offset_size << 4)));
> > +			run_buf += 1;
> > +
> > +			/* Pack the length of run */
> > +			switch (size_size) {
> > +#ifdef __BIG_ENDIAN
> > +			case 8:
> > +				run_buf[7] = p[0];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[1];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[2];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[3];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[4];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[5];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[6];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[7];
> > +#else
> > +			case 8:
> > +				run_buf[7] = p[7];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[6];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[5];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[4];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[3];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[2];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[1];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[0];
> > +#endif
> > +			}
> 
> Why is this not own function? We use this like 5 places. Also isn't
> little endian just memcopy()
> 

You are right, for little endiand this is just memcopy, but if it isn't
inlined, we will have overhead calling the function.

> > +
> > +			run_buf += size_size;
> > +			p = (u8 *)&dlcn;
> 
> I think that when we have function for that switch tmp p is not needed
> anymore.
> 

Yes, but this doesn't give more readability or simplification I think.
More of a personal preference.

> > +
> > +			/* Pack the offset from previous lcn */
> > +			switch (offset_size) {
> > +#ifdef __BIG_ENDIAN
> > +			case 8:
> > +				run_buf[7] = p[0];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[1];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[2];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[3];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[4];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[5];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[6];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[7];
> > +#else
> > +			case 8:
> > +				run_buf[7] = p[7];
> > +				fallthrough;
> > +			case 7:
> > +				run_buf[6] = p[6];
> > +				fallthrough;
> > +			case 6:
> > +				run_buf[5] = p[5];
> > +				fallthrough;
> > +			case 5:
> > +				run_buf[4] = p[4];
> > +				fallthrough;
> > +			case 4:
> > +				run_buf[3] = p[3];
> > +				fallthrough;
> > +			case 3:
> > +				run_buf[2] = p[2];
> > +				fallthrough;
> > +			case 2:
> > +				run_buf[1] = p[1];
> > +				fallthrough;
> > +			case 1:
> > +				run_buf[0] = p[0];
> > +#endif
> > +			}
> 
> > +int run_get_highest_vcn(CLST vcn, const u8 *run_buf, u64 *highest_vcn)
> > +{
> 
> > +		/* skip size_size */
> > +		run += size_size;
> > +
> > +		if (!len)
> > +			goto error;
> > +
> > +		run += offset_size;
> 
> Can this be straight
> run += size_size + offset_size;
> 

Done.

> > +
> > +#ifdef NTFS3_64BIT_CLUSTER
> > +		if ((vcn >> 32) || (len >> 32))
> > +			goto error;
> > +#endif
> > +		vcn64 += len;
> > +	}
> > +
> > +	*highest_vcn = vcn64 - 1;
> > +	return 0;
> > +}