CI: Deduplicate and template jobs
doc: search for dot as it's needed to build doxygen documentation
Update NEWS for 0.6.0
arm64: mc: NEON implementation of w_mask for 16 bpc
CI: run a selection of jobs on a node with avx2
x86: Fix crash in AVX2 cdef_filter with <32-byte stack alignment
arm64: mc: NEON implementation of blend for 16bpc
arm: mc: Optimize blend_v
arm64: mc: Treat the stride as a full 64 bit (potential signed) value in blend_8bpc_neon
arm64: mc: Fix indentation
arm64: mc: Use more intuitive lane specifications for loads/stores
Update NEWS for 0.6.0
CI/armv7: use `linux32 meson ...` to allow running on aarch64
arm64: loopfilter: NEON implementation of loopfilter for 16 bpc
arm: loopfilter: Prepare for 16 bpc
arm: loopfilter: Fix a comment
fuzzing: link the fuzzing binaries as C++
fuzzing: split the fuzzing targets to their own meson.build file
x86: Add mc w_mask 4:4:4 AVX-512 (Ice Lake) asm
x86: Add mc w_mask 4:2:2 AVX-512 (Ice Lake) asm
x86: Add mc w_mask 4:2:0 AVX-512 (Ice Lake) asm
x86: Add mc avg/w_avg/mask AVX-512 (Ice Lake) asm
x86: optimize cdef_filter_{4x{4,8},8x8}_avx2
x86: add a seperate fully edged case to cdef_filter_avx2
checkasm: Improve the cdef input randomization algorithm
cli: Replace malloc + memset(0) with calloc in input.c
cli: remove init_[de]muxers() functions
Replace malloc+memset(0) with calloc
CI: update aarch64 docker image to buster with meson 0.49
arm: cdef: Do an 8 bit implementation for cases with all edges present
arm32: cdef: Fix a typo for consistency
cli: Implement line buffering in print_stats()
arm: cdef: Remove leftover unused labels and macro parameters
arm64: looprestoration: NEON implementation of SGR for 10 bpc
arm64: looprestoration: Prepare for 16 bpc by splitting code to separate files
arm: looprestoration: Add 8bpc to existing function names, add HIGHBD_*_SUFFIX
looprestoration: Add a bpc parameter to the init func
arm: looprestoration: Improve scheduling in box3/5_h slightly
arm: Use int16_t for the tmp intermediate buffer
arm: looprestoration: Fix a comment
NEWS: Official naming is AVX2, not AVX-2
arm64: mc: Reduce the width of a register copy
arm64: mc: Use two regs for alternating output rows for w4/8 in avg/w_avg/mask
arm64: mc: Simplify avg/w_avg/mask by always using the w16 macro
Update NEWS for 0.6.0
arm64: mc: NEON implementation of warp for 16 bpc
arm64: cdef: Add NEON implementations of CDEF for 16 bpc
arm: cdef: Prepare for 16bpc
x86: Add cdef_filter_4x4 AVX-512 (Ice Lake) asm
Reorder the Dav1dFrameHeader struct to fix alignment issues
arm64: looprestoration: NEON implementation of wiener filter for 16 bpc
arm: looprestoration: Fix the wiener C wrapper function for high bitdepths
arm: looprestoration: Prepare for 16bpc wiener filter by adding _8bpc to function names
arm: looprestoration: Clarify a comment
arm64: mc: NEON implementation of put/prep 8tap/bilin for 16 bpc
Update README
x86/msac: add an avx2 version for msac_decode_symbol_adapt16
msac: make symbol_adapt16 a function pointer on x86_64
arm64: mc: NEON implementation of avg/mask/w_avg for 16 bpc
arm: mc: Prepare the init file for higher bitdepths
arm: Make the existing 8bpc assembly only built if 8bpc is enabled
x86: Avoid cmov instructions that depends on multiple flags
x86: Add miscellaneous minor scalar optimizations
x86: Use unsigned pointer comparisons
Rework the CDEF top edge handling
checkasm: Fix missing shift in high bit-depth cdef_filter test
Avoid masking the lsb in high bit-depth stride calculations
checkasm: Increase buffer alignment to 64-byte on x86-64
arm: cdef: Add special cased versions for pri_strength/sec_strength being zero
arm: cdef: Fix some comment typos
Fix crash in dav1d_apply_grain() with negative picture strides
Optimize the cdef_filter C implementation
checkasm: Improve cdef_filter test
Avoid redundant calls to CDEF DSP functions
x86: Bump nasm version requirement to 2.14
CI: Use a newer image to build snap packages
x86: add prep_8tap AVX512 asm
x86: replace "mov hb, Xb" by "movzx hd, Xb" in MC
x86inc: save xmm_regs_used in spill_xmm for non-win64
arm64: itx: Fix overflow/clipping in negation in idct16
x86: Fix overflows in SSSE3 idct
x86: Fix missing saturations in inverse identity asm
SSSE3 implementations of film grain
Reduce scope of NO_SANITIZE usage
Add a workaround for -fsanitize=cfi + dlsym() issue
x86: add prep_bilin AVX512 asm
x86: add avx512icl cpu flag to x86inc.asm
checkasm: x86: ensure all SIMD lanes are turned on at all times
Add misc. inverse transform C optimizations
Skip clipping in the inverse wht transform C implementation
x86: Fix SSSE3 inverse identity transform overflow/clipping
x86: Fix AVX2 inverse identity transform overflow/clipping
Fix building as a meson subproject
Fix missing include for limits.h
arm64: msac: Avoid 32 bit intermediates in symbol_adapt
arm64: itx: Use sqrdmulh in the preexisting identity transform functions
arm64: itx: Specialcase transforms with identity in the first pass with downshift
arm64: itx: Adjust .irp in the 4x16/16x4/8x16/16x8 functions
Don't interleave the skip mode index finding loops