Merge remote-tracking branch 'upstream/master'
arm32: loopfilter: NEON implementation of loopfilter for 16 bpc
arm64: loopfilter16: Fix conditions for skipping parts of the filtering
arm32: loopfilter: Fix a misindented/aligned operand
arm: loopfilter: Compare L != 0 before doing a splat
x86: Rewrite wiener SSE2/SSSE3/AVX2 asm
x86: Rename looprestoration_ssse3.asm to looprestoration_sse.asm
Add miscellaneous minor wiener optimizations
Use smaller data types for wiener filter coefficients
Simplify msac subexp decoding
fuzzer: Test calling dav1d_picture_unref() after dav1d_close()
Fix use of references to buffers after calling dav1d_close()
arm32: looprestoration: NEON implementation of SGR for 10 bpc
arm32: looprestoration: Prepare for 16 bpc by splitting code to separate files
arm: looprestoration16: Fix comments referring to pixels as bytes
arm64: looprestoration: Add a missed parameter in a comment
arm32: looprestoration: Remove an unnecessary stack arg load in SGR
arm32: looprestoration: Specify alignment in loads/stores in SGR where possible
arm64: looprestoration16: Don't keep precalculated squares in box3/5_h
meson: Support running checkasm benchmarks through meson
meson: Place checkasm and header tests in named suites
Update NEWS for 0.8.0
Add more buffer pools
arm32: mc: NEON implementation of warp8x8 for 16 bpc
arm32: cdef: Add NEON implementations of CDEF for 16 bpc
arm32: cdef: Simplify some cases in the padding function
arm64: cdef: Fix a comment typo
Check frame IDs for reference and "show_existing" frames
Combine boxsum and boxsumsqr in SGR C code
Add a picture buffer pool
meson: Handle the b_lto option as a string option for newer meson versions
use less memory in SGR C code
amd64 builtins: disable profiling
Abort frame decoding properly on reference error
Avoid using %ld for debugging in obu.c
Add debug code for HDR metadata
CI/test-debian-asan: run address sanitizer tests both with and without asm
fuzzer: parse '--cpumask X' command line argument
arm32: looprestoration: NEON implementation of wiener filter for 16 bpc
arm64: looprestoration16: Reorder instructions to avoid close data dependencies
arm64: looprestoration16: Use narrower operations where possible when filtering one pixel
arm32: looprestoration: Optimize the 4-pixel wide horizontal wiener filter
arm32: looprestoration: Remove an unused macro that is used only once
arm32: looprestoration: Specify alignment for more loads/stores
arm32: looprestoration: Fix missed vertical alignment
pthread_create: make sure attr isn't nil
tests: avoid using sed in header test
meson: Set msvc like warning options for clang-cl
meson: Use gas-preprocessor as generator, for targets that need it
build: increase minimal meson to 0.49
x86: Add misc mc asm tweaks
Ban op->idc that may drop all layer-specific OBUs
stb image resize is useless, remove it
simplify player's logic and fix timing
tools/dav1d: fix the build
threads: fix pthread_cond_wait
threads: set stack size to whatever is in attributes
set frame/tile threads to NPROC too
set maxframes to NPROC
add stb image resize here for now
trying out yuv->rgb and resampling on frames in parallel
provide amd64-specific ctz, clz and clzll builtins
cli: Add support for Unicode and long paths on Windows 10
change a few numbers here and there
put yuv2rgb code into av19
av19: a base for future video player
fix atomics and few other things to make decoding work
Merge remote-tracking branch 'upstream' into master
remove useless traces
build both 8 and 16 bpc logic
arm32: mc: NEON implementation of put/prep 8tap/bilin for 16 bpc
arm64: mc: Apply tuning from w4/w8 case to w2 case in 16 bpc 8tap_hv
arm: mc: Avoid an unnecessary mov in 8tap_hv w2
arm32: mc: Load 8tap filter coefficients with alignment where possible
arm32: mc: Use narrower vext.8 in 8tap_w4_h
arm64: mc: Use more descriptive element specifiers for loads/stores in 16 bpc put_neon
cli: Use proper integer math in Y4M PAR calculations
Output render size to Y4M
arm32: mc: NEON implementation of avg/mask/w_avg for 16 bpc
cli: Print the decoding fps even if the file lacks a nominal framerate
tests: test stand alone API header compilation
dav1d/headers.h: add missing stdint.h include
contributing: document the allowed internal use of anonymous structs/unions
bump soname for API changes
API: move reserved space in Dav1dSettings to the end
API: remove anonymous struct and union from Dav1dWarpedMotionParams
CI: compare x86inc.asm with upstream
x86inc.asm: remove private_prefix define and config.asm include
x86inc.asm: use standalone x86inc.asm as upstream
x86inc.asm: Properly sort instructions in alphabetical order
checkasm: Add ifdefs around the readtime check
checkasm: Enforce declare_func to be outside of check_func
obu: remove a few unnecessary calls to memset()