[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c360943e-c922-9688-5956-e3e5a35c06d8@gmx.com>
Date: Tue, 2 Aug 2022 09:11:09 +0800
From: Qu Wenruo <quwenruo.btrfs@....com>
To: dsterba@...e.cz, torvalds@...ux-foundation.org,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] Btrfs updates for 5.20
On 2022/8/2 00:40, David Sterba wrote:
> Hi,
>
> this update brings some long awaited changes, the send protocol bump,
> otherwise lots of small improvements and fixes. The main core part is
> reworking bio handling, cleaning up the submission and endio and
> improving error handling.
>
> There are some non-btrfs patches adding helpers or updating API,
> listed at the end of the changelog.
>
> Please pull, thanks.
>
> Features:
>
> - sysfs:
> - export chunk size, in debug mode add tunable for setting its size
> - show zoned among features (was only in debug mode)
> - show commit stats (number, last/max/total duration)
>
> - send protocol updated to 2
> - new commands:
> - ability write larger data chunks than 64K
> - send raw compressed extents (uses the encoded data ioctls), ie. no
> decompression on send side, no compression needed on receive side
> if supported
> - send 'otime' (inode creation time) among other timestamps
> - send file attributes (a.k.a file flags and xflags)
> - this is first version bump, backward compatibility on send and
> receive side is provided
> - there are still some known and wanted commands that will be
> implemented in the near future, another version bump will be needed,
> however we want to minimize that to avoid causing usability issues
>
> - print checksum type and implementation at mount time
>
> - don't print some messages at mount (mentioned as people asked about
> it), we want to print messages namely for new features so let's make
> some space for that
> - big metadata - this has been supported for a long time and is not a
> feature that's worth mentioning
> - skinny metadata - same reason, set by default by mkfs
>
> Performance improvements:
>
> - reduced amount of reserved metadata for delayed items
> - when inserted items can be batched into one leaf
> - when deleting batched directory index items
> - when deleting delayed items used for deletion
> - overall improved count of files/sec, decreased subvolume lock
> contention
>
> - metadata item access bounds checker micro-optimized, with a few
> percent of improved runtime for metadata-heavy operations
>
> - increase direct io limit for read to 256 sectors, improved throughput
> by 3x on sample workload
>
> Notable fixes:
>
> - raid56
> - reduce parity writes, skip sectors of stripe when there are no data
> updates
> - restore reading from stripe cache instead of triggering new read
Small typo I guess.
It's the opposite, restore reading from on-disk data instead of using
stripe cache.
In fact, these two modification mostly makes btrfs/125 and other
recovery related scenarios (greatly reduce the chance of destructive RMW).
In fact, these two may be way more important than write-intent/full-journal.
Thanks,
Qu
>
> - refuse to replay log with unknown incompat read-only feature bit set
>
> - zoned
> - fix page locking when COW fails in the middle of allocation
> - improved tracking of active zones, ZNS drives may limit the number
> and there are ENOSPC errors due to that limit and not actual lack of
> space
> - adjust maximum extent size for zone append so it does not cause late
> ENOSPC due to underreservation
>
> - mirror reading error messages show the mirror number
>
> - don't fallback to buffered IO for NOWAIT direct IO writes, we don't
> have the NOWAIT semantics for buffered io yet
>
> - send, fix sending link commands for existing file paths when there are
> deleted and created hardlinks for same files
>
> - repair all mirrors for profiles with more than 1 copy (raid1c34)
>
> - fix repair of compressed extents, unify where error detection and
> repair happen
>
> Core changes:
>
> - bio completion cleanups
> - don't double defer compression bios
> - simplify endio workqueues
> - add more data to btrfs_bio to avoid allocation for read requests
> - rework bio error handling so it's same what block layer does, the
> submission works and errors are consumed in endio
> - when asynchronous bio offload fails fall back to synchronous
> checksum calculation to avoid errors under writeback or memory
> pressure
>
> - new trace points
> - raid56 events
> - ordered extent operations
>
> - super block log_root_transid deprecated (never used)
>
> - mixed_backref and big_metadata sysfs feature files removed, they've
> been default for sufficiently long time, there are no known users and
> mixed_backref could be confused with mixed_groups
>
> Non-btrfs changes, API updates:
>
> - minor highmem API update to cover const arguments
>
> - switch all kmap/kmap_atomic to kmap_local
>
> - remove redundant flush_dcache_page()
>
> - address_space_operations::writepage callback removed
>
> - add bdev_max_segments() helper
>
> ----------------------------------------------------------------
> The following changes since commit e0dccc3b76fb35bb257b4118367a883073d7390e:
>
> Linux 5.19-rc8 (2022-07-24 13:26:27 -0700)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.20-tag
>
> for you to fetch changes up to 0b078d9db8793b1bd911e97be854e3c964235c78:
>
> btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read (2022-07-25 19:56:16 +0200)
>
> ----------------------------------------------------------------
> BingJing Chang (2):
> btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
> btrfs: send: fix sending link commands for existing file paths
>
> Christoph Hellwig (37):
> btrfs: factor out a helper to end a single sector buffer I/O
> btrfs: refactor end_bio_extent_readpage code flow
> btrfs: factor out a btrfs_csum_ptr helper
> btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
> btrfs: move more work into btrfs_end_bioc
> btrfs: simplify code flow in btrfs_submit_dio_bio
> btrfs: split btrfs_submit_data_bio to read and write parts
> btrfs: defer I/O completion based on the btrfs_raid_bio
> btrfs: don't double-defer bio completions for compressed reads
> btrfs: don't use btrfs_bio_wq_end_io for compressed writes
> btrfs: centralize setting REQ_META
> btrfs: remove btrfs_end_io_wq
> btrfs: factor stripe submission logic out of btrfs_map_bio
> btrfs: do not allocate a btrfs_bio for low-level bios
> btrfs: don't use bio->bi_private to pass the inode to submit_one_bio
> btrfs: merge end_write_bio and flush_write_bio
> btrfs: pass the btrfs_bio_ctrl to submit_one_bio
> btrfs: stop looking at btrfs_bio->iter in index_one_bio
> btrfs: split discard handling out of btrfs_map_block
> btrfs: remove the finish_func argument to btrfs_mark_ordered_io_finished
> btrfs: increase direct io read size limit to 256 sectors
> btrfs: remove extent writepage address space operation
> btrfs: raid56: use fixed stripe length everywhere
> btrfs: do not return errors from btrfs_map_bio
> btrfs: do not return errors from raid56_parity_write
> btrfs: do not return errors from raid56_parity_recover
> btrfs: raid56: transfer the bio counter reference to the raid submission helpers
> btrfs: simplify sync/async submission in btrfs_submit_data_write_bio
> btrfs: handle allocation failure in btrfs_wq_submit_bio gracefully
> btrfs: do not return errors from btrfs_submit_dio_bio
> btrfs: merge btrfs_dev_stat_print_on_error with its only caller
> btrfs: repair all known bad mirrors
> btrfs: simplify the pending I/O counting in struct compressed_bio
> btrfs: pass a btrfs_bio to btrfs_repair_one_sector
> btrfs: remove the start argument to check_data_csum and export
> btrfs: fix repair of compressed extents
> btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
>
> David Sterba (30):
> btrfs: fix typos in comments
> btrfs: remove redundant calls to flush_dcache_page
> btrfs: remove redundant check in up check_setget_bounds
> btrfs: sysfs: advertise zoned support among features
> btrfs: open code rbtree search in split_state
> btrfs: open code rbtree search in insert_state
> btrfs: lift start and end parameters to callers of insert_state
> btrfs: pass bits by value not by pointer for extent_state helpers
> btrfs: add fast path for extent_state insertion
> btrfs: remove node and parent parameters from insert_state
> btrfs: open code inexact rbtree search in tree_search
> btrfs: make tree search for insert more generic and use it for tree_search
> btrfs: unify tree search helper returning prev and next nodes
> btrfs: call inode_to_path directly and drop indirection
> btrfs: simplify parameters of backref iterators
> btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino
> btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t
> btrfs: send: drop __KERNEL__ ifdef from send.h
> btrfs: send: simplify includes
> btrfs: send: remove old TODO regarding ERESTARTSYS
> btrfs: send: use boolean types for current inode status
> btrfs: send: add OTIME as utimes attribute for proto 2+ by default
> btrfs: send: add new command FILEATTR for file attributes
> btrfs: print checksum type and implementation at mount time
> btrfs: use mask for all RAID1* profiles in btrfs_calc_avail_data_space
> btrfs: merge calculations for simple striped profiles in btrfs_rmap_block
> btrfs: clean up chained assignments
> btrfs: switch btrfs_block_rsv::full to bool
> btrfs: switch btrfs_block_rsv::failfast to bool
> btrfs: use enum for btrfs_block_rsv::type
>
> Fabio M. De Francesco (7):
> btrfs: replace kmap() with kmap_local_page() in inode.c
> btrfs: replace kmap() with kmap_local_page() in lzo.c
> highmem: Make __kunmap_{local,atomic}() take const void pointer
> btrfs: zstd: replace kmap() with kmap_local_page()
> btrfs: zlib: replace kmap() with kmap_local_page() in zlib_compress_pages()
> btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio()
> btrfs: replace kmap_atomic() with kmap_local_page()
>
> Fanjun Kong (1):
> btrfs: use PAGE_ALIGNED instead of IS_ALIGNED
>
> Filipe Manana (18):
> btrfs: balance btree dirty pages and delayed items after a rename
> btrfs: free the path earlier when creating a new inode
> btrfs: balance btree dirty pages and delayed items after clone and dedupe
> btrfs: add assertions when deleting batches of delayed items
> btrfs: deal with deletion errors when deleting delayed items
> btrfs: refactor the delayed item deletion entry point
> btrfs: improve batch deletion of delayed dir index items
> btrfs: assert that delayed item is a dir index item when adding it
> btrfs: improve batch insertion of delayed dir index items
> btrfs: do not BUG_ON() on failure to reserve metadata for delayed item
> btrfs: set delayed item type when initializing it
> btrfs: reduce amount of reserved metadata for delayed item insertion
> btrfs: remove the inode cache check at btrfs_is_free_space_inode()
> btrfs: don't fallback to buffered IO for NOWAIT direct IO writes
> btrfs: set the objectid of the btree inode's location key
> btrfs: add optimized btrfs_ino() version for 64 bits systems
> btrfs: send: always use the rbtree based inode ref management infrastructure
> btrfs: join running log transaction when logging new name
>
> Ioannis Angelakopoulos (2):
> btrfs: collect commit stats, count, duration
> btrfs: sysfs: export commit stats
>
> Johannes Thumshirn (1):
> btrfs: add tracepoints for ordered extents
>
> Josef Bacik (3):
> btrfs: do not batch insert non-consecutive dir indexes during log replay
> btrfs: tree-log: make the return value for log syncing consistent
> btrfs: reset block group chunk force if we have to wait
>
> Naohiro Aota (17):
> btrfs: ensure pages are unlocked on cow_file_range() failure
> btrfs: extend btrfs_cleanup_ordered_extents for NULL locked_page
> btrfs: fix error handling of fallback uncompress write
> btrfs: replace unnecessary goto with direct return at cow_file_range()
> block: add bdev_max_segments() helper
> btrfs: zoned: revive max_zone_append_bytes
> btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size
> btrfs: convert count_max_extents() to use fs_info->max_extent_size
> btrfs: use fs_info->max_extent_size in get_extent_max_capacity()
> btrfs: let can_allocate_chunk return error
> btrfs: zoned: finish least available block group on data bg allocation
> btrfs: zoned: introduce space_info->active_total_bytes
> btrfs: zoned: disable metadata overcommit for zoned
> btrfs: zoned: activate metadata block group on flush_space
> btrfs: zoned: activate necessary block group
> btrfs: zoned: write out partially allocated region
> btrfs: zoned: wait until zone is finished when allocation didn't progress
>
> Nikolay Borisov (9):
> btrfs: introduce btrfs_try_lock_balance
> btrfs: use btrfs_try_lock_balance in btrfs_ioctl_balance
> btrfs: batch up release of reserved metadata for delayed items used for deletion
> btrfs: properly flag filesystem with BTRFS_FEATURE_INCOMPAT_BIG_METADATA
> btrfs: don't print 'flagging with big metadata' anymore on mount
> btrfs: don't print 'has skinny extents' anymore on mount
> btrfs: sysfs: remove MIXED_BACKREF feature file
> btrfs: sysfs: remove BIG_METADATA feature files
> btrfs: simplify error handling in btrfs_lookup_dentry
>
> Omar Sandoval (7):
> btrfs: send: remove unused send_ctx::{total,cmd}_send_size
> btrfs: send: explicitly number commands and attributes
> btrfs: send: add stream v2 definitions
> btrfs: send: write larger chunks when using stream v2
> btrfs: send: get send buffer pages for protocol v2
> btrfs: send: send compressed extents with encoded writes
> btrfs: send: enable support for stream v2 and compressed writes
>
> Pankaj Raghav (1):
> btrfs: zoned: fix comment description for sb_write_pointer logic
>
> Qu Wenruo (25):
> btrfs: quit early if the fs has no RAID56 support for raid56 related checks
> btrfs: introduce a data checksum checking helper
> btrfs: remove duplicated parameters from submit_data_read_repair()
> btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
> btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap
> btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap
> btrfs: only write the sectors in the vertical stripe which has data stripes
> btrfs: update stripe_sectors::uptodate in steal_rbio
> btrfs: add trace event for submitted RAID56 bio
> btrfs: make btrfs_super_block::log_root_transid deprecated
> btrfs: reject log replay if there is unsupported RO compat flag
> btrfs: raid56: avoid double for loop inside finish_rmw()
> btrfs: raid56: avoid double for loop inside __raid56_parity_recover()
> btrfs: raid56: avoid double for loop inside alloc_rbio_essential_pages()
> btrfs: raid56: avoid double for loop inside raid56_rmw_stripe()
> btrfs: raid56: avoid double for loop inside raid56_parity_scrub_stripe()
> btrfs: remove parameter dev_extent_len from scrub_stripe()
> btrfs: use btrfs_chunk_max_errors() to replace tolerance calculation
> btrfs: use btrfs_raid_array to calculate number of parity stripes
> btrfs: use ncopies from btrfs_raid_array in btrfs_num_copies()
> btrfs: use named constant for reserved device space
> btrfs: warn about dev extents that are inside the reserved range
> btrfs: raid56: don't trust any cached sector in __raid56_parity_recover()
> btrfs: output mirror number for bad metadata
> btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block()
>
> Stefan Roesch (3):
> btrfs: store chunk size in space-info struct
> btrfs: sysfs: export chunk size in space infos
> btrfs: sysfs: add force_chunk_alloc trigger to force allocation
>
> arch/parisc/include/asm/cacheflush.h | 6 +-
> arch/parisc/kernel/cache.c | 2 +-
> fs/btrfs/async-thread.h | 1 -
> fs/btrfs/backref.c | 88 ++--
> fs/btrfs/backref.h | 3 +-
> fs/btrfs/block-group.c | 34 +-
> fs/btrfs/block-rsv.c | 21 +-
> fs/btrfs/block-rsv.h | 15 +-
> fs/btrfs/btrfs_inode.h | 25 +-
> fs/btrfs/compression.c | 359 ++++----------
> fs/btrfs/compression.h | 18 +-
> fs/btrfs/ctree.h | 105 ++++-
> fs/btrfs/delalloc-space.c | 6 +-
> fs/btrfs/delayed-inode.c | 395 +++++++++++-----
> fs/btrfs/delayed-inode.h | 11 +
> fs/btrfs/delayed-ref.c | 4 +-
> fs/btrfs/dev-replace.c | 3 +-
> fs/btrfs/disk-io.c | 268 ++++-------
> fs/btrfs/disk-io.h | 17 +-
> fs/btrfs/extent-tree.c | 149 +++---
> fs/btrfs/extent_io.c | 873 ++++++++++++++++-------------------
> fs/btrfs/extent_io.h | 15 +-
> fs/btrfs/file.c | 29 +-
> fs/btrfs/free-space-cache.c | 3 +-
> fs/btrfs/inode.c | 764 +++++++++++++++---------------
> fs/btrfs/ioctl.c | 150 +++---
> fs/btrfs/lzo.c | 28 +-
> fs/btrfs/ordered-data.c | 40 +-
> fs/btrfs/ordered-data.h | 5 +-
> fs/btrfs/raid56.c | 792 +++++++++++++++----------------
> fs/btrfs/raid56.h | 168 ++++++-
> fs/btrfs/reflink.c | 19 +-
> fs/btrfs/scrub.c | 71 ++-
> fs/btrfs/send.c | 781 +++++++++++++++++++++----------
> fs/btrfs/send.h | 169 ++++---
> fs/btrfs/space-info.c | 110 ++++-
> fs/btrfs/space-info.h | 8 +-
> fs/btrfs/struct-funcs.c | 11 +-
> fs/btrfs/subpage.c | 4 +-
> fs/btrfs/super.c | 36 +-
> fs/btrfs/sysfs.c | 186 +++++++-
> fs/btrfs/tests/btrfs-tests.c | 1 +
> fs/btrfs/tests/extent-buffer-tests.c | 3 +-
> fs/btrfs/transaction.c | 26 +-
> fs/btrfs/tree-log.c | 29 +-
> fs/btrfs/tree-log.h | 3 +
> fs/btrfs/volumes.c | 362 +++++++--------
> fs/btrfs/volumes.h | 46 +-
> fs/btrfs/zlib.c | 42 +-
> fs/btrfs/zoned.c | 131 +++++-
> fs/btrfs/zoned.h | 18 +
> fs/btrfs/zstd.c | 33 +-
> include/linux/blkdev.h | 5 +
> include/linux/highmem-internal.h | 10 +-
> include/trace/events/btrfs.h | 158 +++++++
> include/uapi/linux/btrfs.h | 10 +-
> mm/highmem.c | 2 +-
> 57 files changed, 3842 insertions(+), 2829 deletions(-)
Powered by blists - more mailing lists