Remove branch when changing bit in LR edges mask
arm64: cdef: Improve find_dir
arm64: cdef: Calculate two initial parameters in the same vector
arm64: cdef: Use loads with postincrement in more places in the padding function
arm64: cdef: Rewrite an expression slightly
Don't backup pixels if next restoration unit is NONE
Add AVX2 version of generate_grain_uv (4:2:0)
arm64: mc: Schedule instructions better in the warp8x8 functions
Check for RESTORATION_NONE once per frame
arm64: mc: Use sbfx instead of ubfx+sxth in the warp function
x86: Increase precision of SSSE3 IDCT intermediates
x86: Increase precision of AVX2 IDCT intermediates
checkasm: Add a function listing feature
Simplify README build instructions
arm64: ipred: NEON implementation of dc/h/v prediction modes
x86: add warp_affine SSE4 and SSSE3 asm
arm64: itx: Fix overflows in idct
arm64: itx: Consistently use the factor 2896 in adst
arm64: itx: Use smull+smlal instead of addl+mul
dav1dplay: initial support for --zerocopy
dav1dplay: add --untimed for benchmarking purposes
dav1dplay: add --highquality to toggle render quality
x86: add 32-bit support to SSSE3 deblock lpf
x86: add deblocking loopfilters SSSE3 asm (64-bit)
AVX2 for chroma 4:2:0 film grain reconstruction
Remove luma width check in fguv_32x32xn
Y grain AVX2 implementations
Add film grain checkasm tests
Split out film grain block functions into a DSPContext
obu: fix deriving render_width and render_height from reference frames
Silence some clang-cl warnings
x86: Fix buffer overead in mc put
x86: Increase precision of the final inverse ADST transform stages
arm64: itx: Do the final calculation of adst4/adst8/adst16 in 32 bit to avoid too narrow clipping
Prefer __builtin_unreachable() over __assume() on clang-cl
Fix clang-cl assertion warning
arm: Fix assembling with older binutils
TileContext: reorder scratch buffer to avoid conflicts
CI: use "needs:" to break the static build, test stage dependency
Apply high-bitdepth adjustment of pixel index after delta calculation
Use linear interpolation for high bit-depth pixel values
Fix bugs in film grain generation
arm: mc: Making code style consistent
arm: mc: Push fewer registers in w_mask
arm: mc: Remove an unused instruction in w_mask
Check absolute tile positions in sb-to-tile_idx table generation
Use 64-bit integers for warp_affine mvx/mvy calculations
x86: Fix inverse ADST transform overflows
Optimize coef ctx calculations
Consolidate horizontal scan tables
Change scan tables from int16_t to uint16_t
Utilize the constraints in assertions to improve code generation
arm64: mc: NEON implementation of w_mask_444/422/420 function
arm64: mc: NEON implementation of blend, blend_h and blend_v function
Prefer `do {} while (0);` over `while (0);`
Cosmetics: CDF tables
x86: Add an msac function for coefficient hi_tok decoding
Add msac optimizations
Remove unused CDF:s
dav1dplay: abort if no input filename is provided
meson: move dav1dplay to a new examples section
decode_coefs reuse lossless variable
Unroll hi_token loop in decode_coeff
Quick out if seg_id == 0 in get_prev_frame_segid
Avoid CDF overreads in gather_top_partition_prob()
Set thread names on MacOS
Set thread names on Windows 10
arm: mc: Speed up due to memory alignment in ldr/str instructions
checkasm: Catch SIGBUS in addition to the other signals
Export frame ITU-T T.35 Metadata
Improve wiener filter C implementation using loop interchange
tools: player: Add missing string.h header
Set thread names on BSDs
vsx: Add shorter types and unpack helpers
vsx: Set the correct alignment constraints
Update NEWS for 0.4.0
Change SDL include in dav1dplay
arm: mc: neon: Merge load and other related operations in blend/blend_h/blend_v functions
arm: mc: neon: Reduce usage of general purpose registers in blend/blend_v functions
arm: mc: neon: Use vld with ! post-increment instead of a register in blend/blend_h/blend_v function
tools: add a simple player example
Fix handling of some memory allocation failures
Set thread names on Linux
arm: mc: NEON implementation of w_mask_444/422/420 function
dav1d_fuzzer: use Dav1dSettings.frame_size_limit instead of a custom picture allocator
Fix memory leak in dav1d_submit_frame()
obu: also check frame_size_limit with Frame Header OBUs
Improve robustness of handling malloc failures
Correctly return an error on malloc failure
Fix potential memory leak
arm: mc: neon: Improvement in blend_v function
Reduce the size of frame threading buffers
Consolidate scratch buffers
build: fix meson deprecation warning
checkasm: msac: Add verbose printouts on failures
checkasm: cdef: Add verbose prints for output data (and relevant input)
checkasm: looprestoration: Use checkasm_check*