x86: Add minor ipred_z AVX2 optimizations
Shrink some stack buffers in the C versions of ipred_z
Don't backup pixel if next block not "CDEFed"
x86inc: fix LOAD_MM_PERMUTATION for AVX512
x86: adapt SSSE3 cdef_filter_{4x4,4x8,8x8} to SSE2
tools: fix SSE2 cpu masking
ci: Try switching two GCC based arm/aarch64 build configurations to debugoptimized
arm64: ipred: Make sure all symbols are aligned
Update news for 0.5.0: z2-avx2, ipred-neon and wiener-vsx
arm: util: Split movrel into movrel and movrel_local
Check loopfilter levels prior to calling lf_mask
arm64: ipred: NEON implementation of the cfl_ac functions
arm64: ipred: NEON implementation of the cfl_pred functions
arm64: ipred: NEON implementation of the filter function
arm64: ipred: NEON implementation of palette prediction
arm64: ipred: NEON implementation of smooth prediction
arm64: ipred: NEON implementation of paeth prediction
x86: Add ipred_z2 AVX2 asm
Simplify ipred_z C code
checkasm: Improve ipred_z tests
x86: fix generate_grain_uv checkasm crashes on Windows x64
Update NEWS for 0.5.0
Add VSX wiener filter implementation
Move snap to package/ subfolder
arm: mc: Port the ARM64 warp filter to arm32
arm64: mc: Use addp instead of addv+trn1 in warp
arm: cdef: Port the ARM64 CDEF NEON assembly to 32 bit arm
arm: Support PIC loading of non-global symbols in the movrel macro on apple platforms
Remove branch when changing bit in LR edges mask
arm64: cdef: Improve find_dir
arm64: cdef: Calculate two initial parameters in the same vector
arm64: cdef: Use loads with postincrement in more places in the padding function
arm64: cdef: Rewrite an expression slightly
Don't backup pixels if next restoration unit is NONE
Add AVX2 version of generate_grain_uv (4:2:0)
arm64: mc: Schedule instructions better in the warp8x8 functions
Check for RESTORATION_NONE once per frame
arm64: mc: Use sbfx instead of ubfx+sxth in the warp function
x86: Increase precision of SSSE3 IDCT intermediates
x86: Increase precision of AVX2 IDCT intermediates
checkasm: Add a function listing feature
Simplify README build instructions
arm64: ipred: NEON implementation of dc/h/v prediction modes
x86: add warp_affine SSE4 and SSSE3 asm
arm64: itx: Fix overflows in idct
arm64: itx: Consistently use the factor 2896 in adst
arm64: itx: Use smull+smlal instead of addl+mul
dav1dplay: initial support for --zerocopy
dav1dplay: add --untimed for benchmarking purposes
dav1dplay: add --highquality to toggle render quality
x86: add 32-bit support to SSSE3 deblock lpf
x86: add deblocking loopfilters SSSE3 asm (64-bit)
AVX2 for chroma 4:2:0 film grain reconstruction
Remove luma width check in fguv_32x32xn
Y grain AVX2 implementations
Add film grain checkasm tests
Split out film grain block functions into a DSPContext
obu: fix deriving render_width and render_height from reference frames
Silence some clang-cl warnings
x86: Fix buffer overead in mc put
x86: Increase precision of the final inverse ADST transform stages
arm64: itx: Do the final calculation of adst4/adst8/adst16 in 32 bit to avoid too narrow clipping
Prefer __builtin_unreachable() over __assume() on clang-cl
Fix clang-cl assertion warning
arm: Fix assembling with older binutils
TileContext: reorder scratch buffer to avoid conflicts
CI: use "needs:" to break the static build, test stage dependency
Apply high-bitdepth adjustment of pixel index after delta calculation
Use linear interpolation for high bit-depth pixel values
Fix bugs in film grain generation
arm: mc: Making code style consistent
arm: mc: Push fewer registers in w_mask
arm: mc: Remove an unused instruction in w_mask
Check absolute tile positions in sb-to-tile_idx table generation
Use 64-bit integers for warp_affine mvx/mvy calculations
x86: Fix inverse ADST transform overflows
Optimize coef ctx calculations
Consolidate horizontal scan tables
Change scan tables from int16_t to uint16_t
Utilize the constraints in assertions to improve code generation
arm64: mc: NEON implementation of w_mask_444/422/420 function
arm64: mc: NEON implementation of blend, blend_h and blend_v function
Prefer `do {} while (0);` over `while (0);`
Cosmetics: CDF tables
x86: Add an msac function for coefficient hi_tok decoding
Add msac optimizations
Remove unused CDF:s
dav1dplay: abort if no input filename is provided
meson: move dav1dplay to a new examples section
decode_coefs reuse lossless variable
Unroll hi_token loop in decode_coeff
Quick out if seg_id == 0 in get_prev_frame_segid
Avoid CDF overreads in gather_top_partition_prob()
Set thread names on MacOS
Set thread names on Windows 10
arm: mc: Speed up due to memory alignment in ldr/str instructions
checkasm: Catch SIGBUS in addition to the other signals
Export frame ITU-T T.35 Metadata
Improve wiener filter C implementation using loop interchange