lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c360943e-c922-9688-5956-e3e5a35c06d8@gmx.com>
Date:   Tue, 2 Aug 2022 09:11:09 +0800
From:   Qu Wenruo <quwenruo.btrfs@....com>
To:     dsterba@...e.cz, torvalds@...ux-foundation.org,
        linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] Btrfs updates for 5.20



On 2022/8/2 00:40, David Sterba wrote:
> Hi,
>
> this update brings some long awaited changes, the send protocol bump,
> otherwise lots of small improvements and fixes. The main core part is
> reworking bio handling, cleaning up the submission and endio and
> improving error handling.
>
> There are some non-btrfs patches adding helpers or updating API,
> listed at the end of the changelog.
>
> Please pull, thanks.
>
> Features:
>
> - sysfs:
>    - export chunk size, in debug mode add tunable for setting its size
>    - show zoned among features (was only in debug mode)
>    - show commit stats (number, last/max/total duration)
>
> - send protocol updated to 2
>    - new commands:
>      - ability write larger data chunks than 64K
>      - send raw compressed extents (uses the encoded data ioctls), ie. no
>        decompression on send side, no compression needed on receive side
>        if supported
>      - send 'otime' (inode creation time) among other timestamps
>      - send file attributes (a.k.a file flags and xflags)
>    - this is first version bump, backward compatibility on send and
>      receive side is provided
>    - there are still some known and wanted commands that will be
>      implemented in the near future, another version bump will be needed,
>      however we want to minimize that to avoid causing usability issues
>
> - print checksum type and implementation at mount time
>
> - don't print some messages at mount (mentioned as people asked about
>    it), we want to print messages namely for new features so let's make
>    some space for that
>    - big metadata - this has been supported for a long time and is not a
>                     feature that's worth mentioning
>    - skinny metadata - same reason, set by default by mkfs
>
> Performance improvements:
>
> - reduced amount of reserved metadata for delayed items
>    - when inserted items can be batched into one leaf
>    - when deleting batched directory index items
>    - when deleting delayed items used for deletion
>    - overall improved count of files/sec, decreased subvolume lock
>      contention
>
> - metadata item access bounds checker micro-optimized, with a few
>    percent of improved runtime for metadata-heavy operations
>
> - increase direct io limit for read to 256 sectors, improved throughput
>    by 3x on sample workload
>
> Notable fixes:
>
> - raid56
>    - reduce parity writes, skip sectors of stripe when there are no data
>      updates
>    - restore reading from stripe cache instead of triggering new read

Small typo I guess.
It's the opposite, restore reading from on-disk data instead of using
stripe cache.

In fact, these two modification mostly makes btrfs/125 and other
recovery related scenarios (greatly reduce the chance of destructive RMW).

In fact, these two may be way more important than write-intent/full-journal.

Thanks,
Qu
>
> - refuse to replay log with unknown incompat read-only feature bit set
>
> - zoned
>    - fix page locking when COW fails in the middle of allocation
>    - improved tracking of active zones, ZNS drives may limit the number
>      and there are ENOSPC errors due to that limit and not actual lack of
>      space
>    - adjust maximum extent size for zone append so it does not cause late
>      ENOSPC due to underreservation
>
> - mirror reading error messages show the mirror number
>
> - don't fallback to buffered IO for NOWAIT direct IO writes, we don't
>    have the NOWAIT semantics for buffered io yet
>
> - send, fix sending link commands for existing file paths when there are
>    deleted and created hardlinks for same files
>
> - repair all mirrors for profiles with more than 1 copy (raid1c34)
>
> - fix repair of compressed extents, unify where error detection and
>    repair happen
>
> Core changes:
>
> - bio completion cleanups
>    - don't double defer compression bios
>    - simplify endio workqueues
>    - add more data to btrfs_bio to avoid allocation for read requests
>    - rework bio error handling so it's same what block layer does, the
>      submission works and errors are consumed in endio
>    - when asynchronous bio offload fails fall back to synchronous
>      checksum calculation to avoid errors under writeback or memory
>      pressure
>
> - new trace points
>    - raid56 events
>    - ordered extent operations
>
> - super block log_root_transid deprecated (never used)
>
> - mixed_backref and big_metadata sysfs feature files removed, they've
>    been default for sufficiently long time, there are no known users and
>    mixed_backref could be confused with mixed_groups
>
> Non-btrfs changes, API updates:
>
> - minor highmem API update to cover const arguments
>
> - switch all kmap/kmap_atomic to kmap_local
>
> - remove redundant flush_dcache_page()
>
> - address_space_operations::writepage callback removed
>
> - add bdev_max_segments() helper
>
> ----------------------------------------------------------------
> The following changes since commit e0dccc3b76fb35bb257b4118367a883073d7390e:
>
>    Linux 5.19-rc8 (2022-07-24 13:26:27 -0700)
>
> are available in the Git repository at:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.20-tag
>
> for you to fetch changes up to 0b078d9db8793b1bd911e97be854e3c964235c78:
>
>    btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read (2022-07-25 19:56:16 +0200)
>
> ----------------------------------------------------------------
> BingJing Chang (2):
>        btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
>        btrfs: send: fix sending link commands for existing file paths
>
> Christoph Hellwig (37):
>        btrfs: factor out a helper to end a single sector buffer I/O
>        btrfs: refactor end_bio_extent_readpage code flow
>        btrfs: factor out a btrfs_csum_ptr helper
>        btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
>        btrfs: move more work into btrfs_end_bioc
>        btrfs: simplify code flow in btrfs_submit_dio_bio
>        btrfs: split btrfs_submit_data_bio to read and write parts
>        btrfs: defer I/O completion based on the btrfs_raid_bio
>        btrfs: don't double-defer bio completions for compressed reads
>        btrfs: don't use btrfs_bio_wq_end_io for compressed writes
>        btrfs: centralize setting REQ_META
>        btrfs: remove btrfs_end_io_wq
>        btrfs: factor stripe submission logic out of btrfs_map_bio
>        btrfs: do not allocate a btrfs_bio for low-level bios
>        btrfs: don't use bio->bi_private to pass the inode to submit_one_bio
>        btrfs: merge end_write_bio and flush_write_bio
>        btrfs: pass the btrfs_bio_ctrl to submit_one_bio
>        btrfs: stop looking at btrfs_bio->iter in index_one_bio
>        btrfs: split discard handling out of btrfs_map_block
>        btrfs: remove the finish_func argument to btrfs_mark_ordered_io_finished
>        btrfs: increase direct io read size limit to 256 sectors
>        btrfs: remove extent writepage address space operation
>        btrfs: raid56: use fixed stripe length everywhere
>        btrfs: do not return errors from btrfs_map_bio
>        btrfs: do not return errors from raid56_parity_write
>        btrfs: do not return errors from raid56_parity_recover
>        btrfs: raid56: transfer the bio counter reference to the raid submission helpers
>        btrfs: simplify sync/async submission in btrfs_submit_data_write_bio
>        btrfs: handle allocation failure in btrfs_wq_submit_bio gracefully
>        btrfs: do not return errors from btrfs_submit_dio_bio
>        btrfs: merge btrfs_dev_stat_print_on_error with its only caller
>        btrfs: repair all known bad mirrors
>        btrfs: simplify the pending I/O counting in struct compressed_bio
>        btrfs: pass a btrfs_bio to btrfs_repair_one_sector
>        btrfs: remove the start argument to check_data_csum and export
>        btrfs: fix repair of compressed extents
>        btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
>
> David Sterba (30):
>        btrfs: fix typos in comments
>        btrfs: remove redundant calls to flush_dcache_page
>        btrfs: remove redundant check in up check_setget_bounds
>        btrfs: sysfs: advertise zoned support among features
>        btrfs: open code rbtree search in split_state
>        btrfs: open code rbtree search in insert_state
>        btrfs: lift start and end parameters to callers of insert_state
>        btrfs: pass bits by value not by pointer for extent_state helpers
>        btrfs: add fast path for extent_state insertion
>        btrfs: remove node and parent parameters from insert_state
>        btrfs: open code inexact rbtree search in tree_search
>        btrfs: make tree search for insert more generic and use it for tree_search
>        btrfs: unify tree search helper returning prev and next nodes
>        btrfs: call inode_to_path directly and drop indirection
>        btrfs: simplify parameters of backref iterators
>        btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino
>        btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t
>        btrfs: send: drop __KERNEL__ ifdef from send.h
>        btrfs: send: simplify includes
>        btrfs: send: remove old TODO regarding ERESTARTSYS
>        btrfs: send: use boolean types for current inode status
>        btrfs: send: add OTIME as utimes attribute for proto 2+ by default
>        btrfs: send: add new command FILEATTR for file attributes
>        btrfs: print checksum type and implementation at mount time
>        btrfs: use mask for all RAID1* profiles in btrfs_calc_avail_data_space
>        btrfs: merge calculations for simple striped profiles in btrfs_rmap_block
>        btrfs: clean up chained assignments
>        btrfs: switch btrfs_block_rsv::full to bool
>        btrfs: switch btrfs_block_rsv::failfast to bool
>        btrfs: use enum for btrfs_block_rsv::type
>
> Fabio M. De Francesco (7):
>        btrfs: replace kmap() with kmap_local_page() in inode.c
>        btrfs: replace kmap() with kmap_local_page() in lzo.c
>        highmem: Make __kunmap_{local,atomic}() take const void pointer
>        btrfs: zstd: replace kmap() with kmap_local_page()
>        btrfs: zlib: replace kmap() with kmap_local_page() in zlib_compress_pages()
>        btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio()
>        btrfs: replace kmap_atomic() with kmap_local_page()
>
> Fanjun Kong (1):
>        btrfs: use PAGE_ALIGNED instead of IS_ALIGNED
>
> Filipe Manana (18):
>        btrfs: balance btree dirty pages and delayed items after a rename
>        btrfs: free the path earlier when creating a new inode
>        btrfs: balance btree dirty pages and delayed items after clone and dedupe
>        btrfs: add assertions when deleting batches of delayed items
>        btrfs: deal with deletion errors when deleting delayed items
>        btrfs: refactor the delayed item deletion entry point
>        btrfs: improve batch deletion of delayed dir index items
>        btrfs: assert that delayed item is a dir index item when adding it
>        btrfs: improve batch insertion of delayed dir index items
>        btrfs: do not BUG_ON() on failure to reserve metadata for delayed item
>        btrfs: set delayed item type when initializing it
>        btrfs: reduce amount of reserved metadata for delayed item insertion
>        btrfs: remove the inode cache check at btrfs_is_free_space_inode()
>        btrfs: don't fallback to buffered IO for NOWAIT direct IO writes
>        btrfs: set the objectid of the btree inode's location key
>        btrfs: add optimized btrfs_ino() version for 64 bits systems
>        btrfs: send: always use the rbtree based inode ref management infrastructure
>        btrfs: join running log transaction when logging new name
>
> Ioannis Angelakopoulos (2):
>        btrfs: collect commit stats, count, duration
>        btrfs: sysfs: export commit stats
>
> Johannes Thumshirn (1):
>        btrfs: add tracepoints for ordered extents
>
> Josef Bacik (3):
>        btrfs: do not batch insert non-consecutive dir indexes during log replay
>        btrfs: tree-log: make the return value for log syncing consistent
>        btrfs: reset block group chunk force if we have to wait
>
> Naohiro Aota (17):
>        btrfs: ensure pages are unlocked on cow_file_range() failure
>        btrfs: extend btrfs_cleanup_ordered_extents for NULL locked_page
>        btrfs: fix error handling of fallback uncompress write
>        btrfs: replace unnecessary goto with direct return at cow_file_range()
>        block: add bdev_max_segments() helper
>        btrfs: zoned: revive max_zone_append_bytes
>        btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size
>        btrfs: convert count_max_extents() to use fs_info->max_extent_size
>        btrfs: use fs_info->max_extent_size in get_extent_max_capacity()
>        btrfs: let can_allocate_chunk return error
>        btrfs: zoned: finish least available block group on data bg allocation
>        btrfs: zoned: introduce space_info->active_total_bytes
>        btrfs: zoned: disable metadata overcommit for zoned
>        btrfs: zoned: activate metadata block group on flush_space
>        btrfs: zoned: activate necessary block group
>        btrfs: zoned: write out partially allocated region
>        btrfs: zoned: wait until zone is finished when allocation didn't progress
>
> Nikolay Borisov (9):
>        btrfs: introduce btrfs_try_lock_balance
>        btrfs: use btrfs_try_lock_balance in btrfs_ioctl_balance
>        btrfs: batch up release of reserved metadata for delayed items used for deletion
>        btrfs: properly flag filesystem with BTRFS_FEATURE_INCOMPAT_BIG_METADATA
>        btrfs: don't print 'flagging with big metadata' anymore on mount
>        btrfs: don't print 'has skinny extents' anymore on mount
>        btrfs: sysfs: remove MIXED_BACKREF feature file
>        btrfs: sysfs: remove BIG_METADATA feature files
>        btrfs: simplify error handling in btrfs_lookup_dentry
>
> Omar Sandoval (7):
>        btrfs: send: remove unused send_ctx::{total,cmd}_send_size
>        btrfs: send: explicitly number commands and attributes
>        btrfs: send: add stream v2 definitions
>        btrfs: send: write larger chunks when using stream v2
>        btrfs: send: get send buffer pages for protocol v2
>        btrfs: send: send compressed extents with encoded writes
>        btrfs: send: enable support for stream v2 and compressed writes
>
> Pankaj Raghav (1):
>        btrfs: zoned: fix comment description for sb_write_pointer logic
>
> Qu Wenruo (25):
>        btrfs: quit early if the fs has no RAID56 support for raid56 related checks
>        btrfs: introduce a data checksum checking helper
>        btrfs: remove duplicated parameters from submit_data_read_repair()
>        btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
>        btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap
>        btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap
>        btrfs: only write the sectors in the vertical stripe which has data stripes
>        btrfs: update stripe_sectors::uptodate in steal_rbio
>        btrfs: add trace event for submitted RAID56 bio
>        btrfs: make btrfs_super_block::log_root_transid deprecated
>        btrfs: reject log replay if there is unsupported RO compat flag
>        btrfs: raid56: avoid double for loop inside finish_rmw()
>        btrfs: raid56: avoid double for loop inside __raid56_parity_recover()
>        btrfs: raid56: avoid double for loop inside alloc_rbio_essential_pages()
>        btrfs: raid56: avoid double for loop inside raid56_rmw_stripe()
>        btrfs: raid56: avoid double for loop inside raid56_parity_scrub_stripe()
>        btrfs: remove parameter dev_extent_len from scrub_stripe()
>        btrfs: use btrfs_chunk_max_errors() to replace tolerance calculation
>        btrfs: use btrfs_raid_array to calculate number of parity stripes
>        btrfs: use ncopies from btrfs_raid_array in btrfs_num_copies()
>        btrfs: use named constant for reserved device space
>        btrfs: warn about dev extents that are inside the reserved range
>        btrfs: raid56: don't trust any cached sector in __raid56_parity_recover()
>        btrfs: output mirror number for bad metadata
>        btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block()
>
> Stefan Roesch (3):
>        btrfs: store chunk size in space-info struct
>        btrfs: sysfs: export chunk size in space infos
>        btrfs: sysfs: add force_chunk_alloc trigger to force allocation
>
>   arch/parisc/include/asm/cacheflush.h |   6 +-
>   arch/parisc/kernel/cache.c           |   2 +-
>   fs/btrfs/async-thread.h              |   1 -
>   fs/btrfs/backref.c                   |  88 ++--
>   fs/btrfs/backref.h                   |   3 +-
>   fs/btrfs/block-group.c               |  34 +-
>   fs/btrfs/block-rsv.c                 |  21 +-
>   fs/btrfs/block-rsv.h                 |  15 +-
>   fs/btrfs/btrfs_inode.h               |  25 +-
>   fs/btrfs/compression.c               | 359 ++++----------
>   fs/btrfs/compression.h               |  18 +-
>   fs/btrfs/ctree.h                     | 105 ++++-
>   fs/btrfs/delalloc-space.c            |   6 +-
>   fs/btrfs/delayed-inode.c             | 395 +++++++++++-----
>   fs/btrfs/delayed-inode.h             |  11 +
>   fs/btrfs/delayed-ref.c               |   4 +-
>   fs/btrfs/dev-replace.c               |   3 +-
>   fs/btrfs/disk-io.c                   | 268 ++++-------
>   fs/btrfs/disk-io.h                   |  17 +-
>   fs/btrfs/extent-tree.c               | 149 +++---
>   fs/btrfs/extent_io.c                 | 873 ++++++++++++++++-------------------
>   fs/btrfs/extent_io.h                 |  15 +-
>   fs/btrfs/file.c                      |  29 +-
>   fs/btrfs/free-space-cache.c          |   3 +-
>   fs/btrfs/inode.c                     | 764 +++++++++++++++---------------
>   fs/btrfs/ioctl.c                     | 150 +++---
>   fs/btrfs/lzo.c                       |  28 +-
>   fs/btrfs/ordered-data.c              |  40 +-
>   fs/btrfs/ordered-data.h              |   5 +-
>   fs/btrfs/raid56.c                    | 792 +++++++++++++++----------------
>   fs/btrfs/raid56.h                    | 168 ++++++-
>   fs/btrfs/reflink.c                   |  19 +-
>   fs/btrfs/scrub.c                     |  71 ++-
>   fs/btrfs/send.c                      | 781 +++++++++++++++++++++----------
>   fs/btrfs/send.h                      | 169 ++++---
>   fs/btrfs/space-info.c                | 110 ++++-
>   fs/btrfs/space-info.h                |   8 +-
>   fs/btrfs/struct-funcs.c              |  11 +-
>   fs/btrfs/subpage.c                   |   4 +-
>   fs/btrfs/super.c                     |  36 +-
>   fs/btrfs/sysfs.c                     | 186 +++++++-
>   fs/btrfs/tests/btrfs-tests.c         |   1 +
>   fs/btrfs/tests/extent-buffer-tests.c |   3 +-
>   fs/btrfs/transaction.c               |  26 +-
>   fs/btrfs/tree-log.c                  |  29 +-
>   fs/btrfs/tree-log.h                  |   3 +
>   fs/btrfs/volumes.c                   | 362 +++++++--------
>   fs/btrfs/volumes.h                   |  46 +-
>   fs/btrfs/zlib.c                      |  42 +-
>   fs/btrfs/zoned.c                     | 131 +++++-
>   fs/btrfs/zoned.h                     |  18 +
>   fs/btrfs/zstd.c                      |  33 +-
>   include/linux/blkdev.h               |   5 +
>   include/linux/highmem-internal.h     |  10 +-
>   include/trace/events/btrfs.h         | 158 +++++++
>   include/uapi/linux/btrfs.h           |  10 +-
>   mm/highmem.c                         |   2 +-
>   57 files changed, 3842 insertions(+), 2829 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ