This commit is contained in:
2026-06-14 19:09:18 +01:00
parent 14bd1a9271
commit 13fa90a0e9
3958 changed files with 999286 additions and 4 deletions
+315
View File
@@ -0,0 +1,315 @@
# Building ASTC Encoder
This page provides instructions for building `astcenc` from the sources in
this repository.
Builds must use CMake 3.15 or higher as the build system generator. The
examples on this page show how to use it to generate build systems for NMake
(Windows) and Make (Linux and macOS), but CMake supports other build system
backends.
## Windows
Builds for Windows are tested with CMake 3.17, and Visual Studio 2019 or newer.
### Configuring the build
To use CMake you must first configure the build. Create a build directory in
the root of the `astcenc` checkout, and then run `cmake` inside that directory
to generate the build system.
```shell
# Create a build directory
mkdir build
cd build
# Configure your build of choice, for example:
# x86-64 using a Visual Studio solution
cmake -G "Visual Studio 16 2019" -T ClangCL -DCMAKE_INSTALL_PREFIX=..\ ^
-DASTCENC_ISA_AVX2=ON -DASTCENC_ISA_SSE41=ON -DASTCENC_ISA_SSE2=ON ..
# x86-64 using NMake
cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=..\ ^
-DASTCENC_ISA_AVX2=ON -DASTCENC_ISA_SSE41=ON -DASTCENC_ISA_SSE2=ON ..
```
A single CMake configure can build multiple binaries for a single target CPU
architecture, for example building x64 for both SSE2 and AVX2. Each binary name
will include the build variant as a postfix. It is possible to build any set of
the supported SIMD variants by enabling only the ones you require.
Using the Visual Studio Clang-CL LLVM toolchain (`-T ClangCL`) is optional but
produces significantly faster binaries than the default toolchain. The C++ LLVM
toolchain component must be installed via the Visual Studio installer.
### Building
Once you have configured the build you can use NMake to compile the project
from your build dir, and install to your target install directory.
```shell
# Run a build and install build outputs in `${CMAKE_INSTALL_PREFIX}/bin/`
cd build
nmake install
```
## macOS and Linux using Make
Builds for macOS and Linux are tested with CMake 3.17, and clang++ 9.0 or
newer.
> Compiling using g++ is supported, but clang++ builds are faster by ~15%.
### Configuring the build
To use CMake you must first configure the build. Create a build directory
in the root of the astcenc checkout, and then run `cmake` inside that directory
to generate the build system.
```shell
# Select your compiler (clang++ recommended, but g++ works)
export CXX=clang++
# Create a build directory
mkdir build
cd build
# Configure your build of choice, for example:
# Arm arch64
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ \
-DASTCENC_ISA_NEON=ON ..
# x86-64
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ \
-DASTCENC_ISA_AVX2=ON -DASTCENC_ISA_SSE41=ON -DASTCENC_ISA_SSE2=ON ..
# macOS universal binary build
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ ..
```
A single CMake configure can build multiple binaries for a single target CPU
architecture, for example building x64 for both SSE2 and AVX2. Each binary name
will include the build variant as a postfix. It is possible to build any set of
the supported SIMD variants by enabling only the ones you require.
For macOS, we additionally support the ability to build a universal binary.
This build includes SSE4.1 (`x86_64`), AVX2 (`x86_64h`), and NEON (`arm64`)
build slices in a single output binary. The OS will select the correct variant
to run for the machine being used. This is the default build target for a macOS
build, but single-target binaries can still be built by setting
`-DASTCENC_UNIVERSAL_BINARY=OFF` and then manually selecting the specific ISA
variants that are required.
### Building
Once you have configured the build you can use Make to compile the project from
your build dir, and install to your target install directory.
```shell
# Run a build and install build outputs in `${CMAKE_INSTALL_PREFIX}/bin/`
# for executable binaries and `${CMAKE_INSTALL_PREFIX}/lib/` for libraries
cd build
make install -j16
```
## macOS using XCode
Builds for macOS and Linux are tested with CMake 3.17, and XCode 14.0 or
newer.
### Configuring the build
To use CMake you must first configure the build. Create a build directory
in the root of the astcenc checkout, and then run `cmake` inside that directory
to generate the build system.
```shell
# Create a build directory
mkdir build
cd build
# Configure a universal build
cmake -G Xcode -DCMAKE_INSTALL_PREFIX=../ ..
```
### Building
Once you have configured the build you can use CMake to compile the project
from your build dir, and install to your target install directory.
```shell
cmake --build . --config Release
# Optionally install the binaries to the installation directory
cmake --install . --config Release
```
## Advanced build options
For codec developers and power users there are a number of useful features in
the build system.
### Build Types
We support and test the following `CMAKE_BUILD_TYPE` options.
| Value | Description |
| ---------------- | -------------------------------------------------------- |
| Release | Optimized release build |
| RelWithDebInfo | Optimized release build with debug info |
| Debug | Unoptimized debug build with debug info |
Note that optimized release builds are compiled with link-time optimization,
which can make profiling more challenging ...
### Shared Libraries
We support building the core library as a shared object by setting the CMake
option `-DASTCENC_SHAREDLIB=ON` at configure time. For macOS build targets the
shared library supports the same universal build configuration as the command
line utility.
Note that the command line tool is always statically linked; the shared objects
are an extra build output that are not currently used by the command line tool.
### Constrained block size builds
All normal builds will support all ASTC block sizes, including the worst case
6x6x6 3D block size (216 texels per block). Compressor memory footprint and
performance can be improved by limiting the block sizes supported in the build
by adding `-DASTCENC_BLOCK_MAX_TEXELS=<texel_count>` to to CMake command line
when configuring. Legal block sizes that are unavailable in a restricted build
will return the error `ASTCENC_ERR_NOT_IMPLEMENTED` during context creation.
### Non-invariant builds
All normal builds are designed to be invariant, so any build from the same git
revision will produce bit-identical results for all compilers and CPU
architectures. To achieve this we sacrifice some performance, so if this is
not required you can specify `-DASTCENC_INVARIANCE=OFF` to enable additional
optimizations. This has most benefit for AVX2 builds where we are able to
enable use of the FMA instruction set extensions.
### No intrinsics builds
All normal builds will use SIMD accelerated code paths using intrinsics, as all
supported target architectures (x86 and arm64) guarantee SIMD availability. For
development purposes it is possible to build an intrinsic-free build which uses
no explicit SIMD acceleration (the compiler may still auto-vectorize).
To enable this binary variant add `-DASTCENC_ISA_NONE=ON` to the CMake command
line when configuring. It is NOT recommended to use this for production; it is
significantly slower than the vectorized SIMD builds.
### No x86 gather instruction builds
On many x86 microarchitectures the native AVX gather instructions are slower
than simply performing manual scalar loads and combining the results. Gathers
are enabled by default, but can be disabled by setting the CMake option
`-DASTCENC_X86_GATHERS=OFF` on the command line when configuring.
Note that we have seen mixed results when compiling the scalar fallback path,
so we would recommend testing which option works best for the compiler and
microarchitecture pairing that you are targeting.
### Test builds
We support building unit tests. These use the `googletest` framework, which is
pulled in though a git submodule. On first use, you must fetch the submodule
dependency:
```shell
git submodule init
git submodule update
```
To build unit tests add `-DASTCENC_UNITTEST=ON` to the CMake command line when
configuring.
To run unit tests use the CMake `ctest` utility from your build directory after
you have built the tests.
```shell
cd build
ctest --verbose
```
### Sanitizer builds
We support building with sanitizers on Linux and macOS when using Clang.
To build binaries with ASAN checking enabled add `-DASTCENC_ASAN=ON` to the
CMake command line when configuring.
To build binaries with UBSAN checking enabled add `-DASTCENC_UBSAN=ON` to the
CMake command line when configuring.
### Android builds
Builds of the command line utility for Android are not officially supported, but can be a useful
development build for testing on e.g. different Arm CPU microarchitectures.
The build script below shows one possible route to building the command line tool for Android. Once
built the application can be pushed to e.g. `/data/local/tmp` and executed from an Android shell
terminal over `adb`.
```shell
ANDROID_ABI=arm64-v8a
ANDROID_NDK=/work/tools/android/ndk/22.1.7171670
BUILD_TYPE=RelWithDebInfo
BUILD_DIR=build
mkdir -p ${BUILD_DIR}
cd ${BUILD_DIR}
cmake \
-DCMAKE_INSTALL_PREFIX=./ \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=${ANDROID_ABI} \
-DANDROID_ARM_NEON=ON \
-DANDROID_PLATFORM=android-21 \
-DCMAKE_ANDROID_NDK_TOOLCHAIN_VERSION=clang \
-DANDROID_TOOLCHAIN=clang \
-DANDROID_STL=c++_static \
-DARCH=aarch64 \
-DASTCENC_ISA_NEON=ON \
..
make -j16
```
## Packaging a release bundle
We support building a release bundle of all enabled binary configurations in
the current CMake configuration using the `package` build target
Configure CMake with:
* `-DASTCENC_PACAKGE=<arch>` to set the package architecture/variant name used
to name the package archive (not set by default).
```shell
# Run a build and package build outputs in `./astcenc-<ver>-<os>-<arch>.<fmt>`
cd build
make package -j16
```
Windows packages will use the `.zip` format, other packages will use the
`.tar.gz` format.
## Integrating as a library into another project
The core codec of `astcenc` is built as a library, and so can be easily
integrated into other projects using CMake. An example of the CMake integration
and the codec API usage can be found in the `./Utils/Example` directory in the
repository. See the [Example Readme](../Utils/Example/README.md) for more
details.
- - -
_Copyright © 2019-2024, Arm Limited and contributors. All rights reserved._
+328
View File
@@ -0,0 +1,328 @@
# 2.x series change log
This page summarizes the major functional and performance changes in each
release of the 2.x series.
All performance data on this page is measured on an Intel Core i5-9600K
clocked at 4.2 GHz, running astcenc using 6 threads.
<!-- ---------------------------------------------------------------------- -->
## 2.5
**Status:** Released, March 2021
The 2.5 release is the last major release in the 2.x series. After this release
a `2.x` branch will provide stable long-term support, and the `main` branch
will switch to focusing on more radical changes for the 3.x series.
Reminder for users of the library interface - the API is not designed to be
stable across versions, and this release is not compatible with earlier 2.x
releases. Please update and rebuild your client-side code using the updated
`astcenc.h` header.
**General:**
* **Feature:** The `ISA_INVARIANCE` build option is no longer supported, as
there is no longer any performance benefit from the variant paths. All
builds are now using the equivalent of the `ISA_INVARIANCE=ON` setting, and
all builds (except Armv7) are now believed to be invariant across operating
systems, compilers, CPU architectures, and SIMD instruction sets.
* **Feature:** Armv8 32-bit builds with NEON are now supported, with
out-of-the-box support for Arm Linux soft-float and hard-float ABIs. There
are no pre-built binaries for these targets; support is included for
library users targeting older 32-bit Android and iOS devices.
* **Feature:** A compressor mode for encoding HDR textures that have been
encoded into LDR RGBM wrapper format is now supported. Note that this
encoding has some strong recommendations for how the RGBM encoding is
implemented to avoid block artifacts in the compressed image.
* **Core API:**
* **API Change:** The core API has been changed to be a pure C API, making it
easier to wrap the codec in a stable shared library ABI. Some entry points
that used to accept references now expect pointers.
* **API Change:** The decompression functionality in the core API has been
changed to allow use of multiple threads. The design pattern matches the
compression functionality, requiring the caller to create the threads,
synchronize them between images, and to call the new
`astcenc_decompress_reset()` function between images.
* **API Feature:** Defines to support exporting public API entry point
symbols from a shared object are provided, but not exposed off-the-shelf by
the CMake provided by the project.
* **API Feature:** New `astcenc_get_block_info()` function added to the core
API to allow users to perform high level analysis of compressed data. This
API is not implemented in decompressor-only builds.
* **API Feature:** Codec configuration structure has been extended to expose
the new RGBM compression mode. See the API header for details.
<!-- ---------------------------------------------------------------------- -->
## 2.4
**Status:** Released, February 2021
The 2.4 release is the fifth release in the 2.x series. It is primarily a bug
fix release for HDR image handling, which impacts all earlier 2.x series
releases.
**General:**
* **Feature:** When using the `-a` option, or the equivalent config option
for the API, any 2D blocks that are entirely zero alpha after the alpha
filter radius is taken into account are replaced by transparent black
constant color blocks. This is an RDO-like technique to improve compression
ratios of any additional application packaging compression that is applied.
**Command Line:**
* **Bug fix:** The command line wrapper now correctly loads HDR images that
have a non-square aspect ratio.
<!-- ---------------------------------------------------------------------- -->
## 2.3
**Status:** Released, January 2021
The 2.3 release is the fourth release in the 2.x series. It includes a number
of performance improvements and new features.
Reminder for users of the library interface - the API is not designed to be
stable across versions, and this release is not compatible with 2.2. Please
recompile your client-side code using the updated `astcenc.h` header.
* **General:**
* **Feature:** Decompressor-only builds of the codec are supported again.
While this is primarily a feature for library users who want to shrink
binary size, a variant command line tool `astcdec` can be built by
specifying `DECOMPRESSOR=ON` on the CMake configure command line.
* **Feature:** Diagnostic builds of the codec can now be built. These builds
generate a JSON file containing a trace of the compressor execution.
Diagnostic builds are only suitable for codec development; they are slower
and JSON generation cannot be disabled. Build by setting `DIAGNOSTICS=ON`
on the CMake configure command line.
* **Feature:** Code compatibility improved with older versions of GCC,
earliest compiler now tested is GCC 7.5 (was GCC 9.3).
* **Feature:** Code compatibility improved with newer versions of LLVM,
latest compiler now tested is Clang 12.0 (was Clang 9.0).
* **Feature:** Code compatibility improved with the Visual Studio 2019 LLVM
toolset (`clang-cl`). Using the LLVM toolset gives 25% performance
improvements and is recommended.
* **Command Line:**
* **Feature:** Quality level now accepts either a preset (`-fast`, etc) or a
float value between 0 and 100, allowing more control over the compression
quality vs performance trade-off. The presets are not evenly spaced in the
float range; they have been spaced to give the best distribution of points
between the fast and thorough presets.
* `-fastest`: 0.0
* `-fast`: 10.0
* `-medium`: 60.0
* `-thorough`: 98.0
* `-exhaustive`: 100.0
* **Core API:**
* **API Change:** Quality level preset enum replaced with a float value
between 0 (`-fastest`) and 100 (`-exhaustive`). See above for more info.
### Performance
This release includes a number of optimizations to improve performance.
* New compressor algorithm for handling encoding candidates and refinement.
* Vectorized implementation of `compute_error_of_weight_set()`.
* Unrolled implementation of `encode_ise()`.
* Many other small improvements!
The most significant change is the change to the compressor path, which now
uses an adaptive approach to candidate trials and block refinement.
In earlier releases the quality level will determine the number of encoding
candidates and the number of iterative refinement passes that are used for each
major encoding trial. This is a fixed behavior; it will always try the full N
candidates and M refinement iterations specified by the quality level for each
encoding trial.
The new approach implements two optimizations for this:
* Compression will complete when a block candidate hits the specified target
quality, after its M refinement iterations have been applied. Later block
candidates are simply abandoned.
* Block candidates will predict how much refinement can improve them, and
abandon refinement if they are unlikely to improve upon the best known
encoding already in-hand.
This pair of optimizations provides significant performance improvement to the
high quality modes which use the most block candidates and refinement
iterations. A minor loss of image quality is expected, as the blocks we no
longer test or refine may have been better coding choices.
**Absolute performance vs 2.2 release:**
![Absolute scores 2.3 vs 2.2](./ChangeLogImg/absolute-2.2-to-2.3.png)
**Relative performance vs 2.2 release:**
![Relative scores 2.3 vs 2.2](./ChangeLogImg/relative-2.2-to-2.3.png)
<!-- ---------------------------------------------------------------------- -->
## 2.2
**Status:** Released, January 2021
The 2.2 release is the third release in the 2.x series. It includes a number
of performance improvements and new features.
Reminder for users of the library interface - the API is not designed to be
stable across versions, and this release is not compatible with 2.1. Please
recompile your client-side code using the updated `astcenc.h` header.
* **General:**
* **Feature:** New Arm aarch64 NEON accelerated vector library support.
* **Improvement:** New CMake build system for all platforms.
* **Improvement:** SSE4.2 feature profile changed to SSE4.1, which more
accurately reflects the feature set used.
* **Binary releases:**
* **Improvement:** Linux binaries changed to use Clang 9.0, which gives
up to 15% performance improvement.
* **Improvement:** Windows binaries are now code signed.
* **Improvement:** macOS binaries for Apple silicon platforms now provided.
* **Improvement:** macOS binaries are now code signed and notarized.
* **Command Line:**
* **Feature:** New image preprocess `-pp-normalize` option added. This forces
normal vectors to be unit length, which is useful when compressing source
textures that use normal length to encode an NDF, which is incompatible
with ASTC's two channel encoding.
* **Feature:** New image preprocess `-pp-premultiply` option added. This
scales RGB values by the alpha value. This can be useful to minimize
cross-channel color bleed caused by GPU post-multiply filtering/blending.
* **Improvements:** Command line tool cleanly traps and reports errors for
corrupt input images rather than relying on standard library `assert()`
calls in release builds.
* **Core API:**
* **API Change:** Images using region-based metrics no longer need to include
padding; all input images should be tightly packed and `dim_pad` is removed
from the `astcenc_image` structure. This makes it easier to directly use
images loaded from other libraries.
* **API Change:** Image `data` is no longer a 3D array accessed using
`data[z][y][x]` indexing, it's an array of 2D slices. This makes it easier
to directly use images loaded from other libraries.
* **API Change:** New `ASTCENC_FLG_SELF_DECOMPRESS_ONLY` flag added to the
codec config. Using this flag enables additional optimizations that
aggressively exploit implementation- and configuration-specific, behavior
to gain performance. When using this flag the codec can only reliably
decompress images that were compressed in the same context session. Images
produced via other means may fail to decompress correctly, even if they are
otherwise valid ASTC files.
### Performance
There is one major set of optimizations in this release, related to the new
`ASTCENC_FLG_SELF_DECOMPRESS_ONLY` mode. These allow the compressor to only
create data tables it knows that it is going to use, based on its current set
of heuristics, rather than needing the full set the format allows.
The first benefit of these changes is a reduced context creation time, which
can be reduced by up to 250ms on our test machine. This is a significant
percentage of the command line utility runtime for a small image when using a
quick search preset. Compressing the whole Kodak test suite using the command
line utility and the `-fastest` preset is ~30% faster with this release, which
is mostly due to faster startup.
The reduction in the data table size in this mode also improve the core codec
speed. Our test sets show an average of 12% improvement in the codec for
`-fastest` mode, and an average of 3% for `-medium` mode.
Key for performance charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Absolute performance vs 2.1 release:**
![Absolute scores 2.2 vs 2.1](./ChangeLogImg/absolute-2.1-to-2.2.png)
**Relative performance vs 2.1 release:**
![Relative scores 2.2 vs 2.1](./ChangeLogImg/relative-2.1-to-2.2.png)
<!-- ---------------------------------------------------------------------- -->
## 2.1
**Status:** Released, November 2020
The 2.1 release is the second release in the 2.x series. It includes a number
of performance optimizations and new features.
Reminder for users of the library interface - the API is not designed to be
stable across versions, and this release is not compatible with 2.0. Please
recompile your client-side code using the updated `astcenc.h` header.
### Features:
* **Command line:**
* **Bug fix:** The meaning of the `-tH\cH\dH` and `-th\ch\dh` compression
modes was inverted. They now match the documentation; use `-*H` for HDR
RGBA, and `-*h` for HDR RGB with LDR alpha.
* **Feature:** A new `-fastest` quality preset is now available. This is
designed for fast "roughing out" of new content, and sacrifices significant
image quality compared to `-fast`. We do not recommend its use for
production builds.
* **Feature:** A new `-candidatelimit` compression tuning option is now
available. This is a power-user control to determine how many candidates
are returned for each block mode encoding trial. This feature is used
automatically by the search presets; see `-help` for details.
* **Improvement:** The compression test modes (`-tl\ts\th\tH`) now emit a
MTex/s performance metric, in addition to coding time.
* **Core API:**
* **Feature:** A new quality preset `ASTCENC_PRE_FASTEST` is available. See
`-fastest` above for details.
* **Feature:** A new tuning option `tune_candidate_limit` is available in
the config structure. See `-candidatelimit` above for details.
* **Feature:** Image input/output can now use `ASTCENC_TYPE_F32` data types.
* **Stability:**
* **Feature:** The SSE2, SSE4.2, and AVX2 variants now produce identical
compressed output when run on the same CPU when compiled with the
preprocessor define `ASTCENC_ISA_INVARIANCE=1`. For Make builds this can
be set on the command line by setting `ISA_INV=1`. ISA invariance is off
by default; it reduces performance by 1-3%.
### Performance
Key for performance charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Absolute performance vs 2.0 release:**
![Absolute scores 2.1 vs 2.0](./ChangeLogImg/absolute-2.0-to-2.1.png)
**Relative performance vs 2.0 release:**
![Relative scores 2.1 vs 2.0](./ChangeLogImg/relative-2.0-to-2.1.png)
<!-- ---------------------------------------------------------------------- -->
## 2.0
**Status:** Released, August 2020
The 2.0 release is first release in the 2.x series. It includes a number of
major changes over the earlier 1.7 series, and is not command-line compatible.
### Features:
* The core codec can be built as a library, exposed via a new codec API.
* The core codec supports accelerated SIMD paths for SSE2, SSE4.2, and AVX2.
* The command line syntax has a clearer mapping to Khronos feature profiles.
### Performance:
Key for performance charts
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Absolute performance vs 1.7 release:**
![Absolute scores 2.0 vs 1.7](./ChangeLogImg/absolute-1.7-to-2.0.png)
**Relative performance vs 1.7 release:**
![Relative scores 2.0 vs 1.7](./ChangeLogImg/relative-1.7-to-2.0.png)
- - -
_Copyright © 2020-2022, Arm Limited and contributors. All rights reserved._
+308
View File
@@ -0,0 +1,308 @@
# 3.x series change log
This page summarizes the major functional and performance changes in each
release of the 3.x series.
All performance data on this page is measured on an Intel Core i5-9600K
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
<!-- ---------------------------------------------------------------------- -->
## 3.7
**Status:** April 2022
The 3.7 release contains another round of performance optimizations, including
significant improvements to the command line front-end (faster PNG loader) and
the arm64 build of the codec (faster NEON implementation).
* **General:**
* **Feature:** The command line tool PNG loader has been switched to use
the Wuffs library, which is robust and significantly faster than the
current stb_image implementation.
* **Feature:** Support for non-invariant builds returns. Opt-in to slightly
faster, but not bit-exact, builds by setting `-DNO_INVARIANCE=ON` for the
CMake configuration. This improves performance by around 2%.
* **Optimization:** Changed SIMD `select()` so that it matches the default
NEON behavior (bitwise select), rather than the default x86-64 behavior
(lane select on MSB). Specialization `select_msb()` added for the one case
we want to select on a sign-bit, where NEON needs a different
implementation. This provides a significant (>25%) performance uplift on
NEON implementations.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 3.5 release:**
![Relative scores 3.7 vs 3.6](./ChangeLogImg/relative-3.6-to-3.7.png)
<!-- ---------------------------------------------------------------------- -->
## 3.6
**Status:** April 2022
The 3.6 release contains another round of performance optimizations.
There are no interface changes in this release, but in general the API is not
designed to be binary compatible across versions. We always recommend
rebuilding your client-side code using the updated `astcenc.h` header.
* **General:**
* **Feature:** Data tables are now optimized for contexts without the
`SELF_DECOMPRESS_ONLY` flag set. The flag therefore no longer improves
compression performance, but still reduces context creation time and
context data table memory footprint.
* **Feature:** Image quality for 4x4 `-fastest` configuration has been
improved.
* **Optimization:** Decimation modes are reliably excluded from processing
when they are only partially selected in the compressor configuration (e.g.
if used for single plane, but not dual plane modes). This is a significant
performance optimization for all quality levels.
* **Optimization:** Fast-path block load function variant added for 2D LDR
images with no swizzle. This is a moderate performance optimization for the
fast and fastest quality levels.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 3.5 release:**
![Relative scores 3.6 vs 3.5](./ChangeLogImg/relative-3.5-to-3.6.png)
<!-- ---------------------------------------------------------------------- -->
## 3.5
**Status:** March 2022
The 3.5 release contains another round of performance optimizations.
There are no interface changes in this release, but in general the API is not
designed to be binary compatible across versions. We always recommend
rebuilding your client-side code using the updated `astcenc.h` header.
* **General:**
* **Feature:** Compressor configurations using `SELF_DECOMPRESS_ONLY` mode
store compacted partition tables, which significantly improves both
context create time and runtime performance.
* **Feature:** Bilinear infill for decimated weight grids supports a new
variant for half-decimated grids which are only decimated in one axis.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 3.4 release:**
![Relative scores 3.5 vs 3.4](./ChangeLogImg/relative-3.4-to-3.5.png)
<!-- ---------------------------------------------------------------------- -->
## 3.4
**Status:** February 2022
The 3.4 release introduces another round of optimizations, removing a number
of power-user configuration options to simplify the core compressor data path.
Reminder for users of the library interface - the API is not designed to be
binary compatible across versions, and this release is not compatible with
earlier releases. Please update and rebuild your client-side code using the
updated `astcenc.h` header.
* **General:**
* **Feature:** Many memory allocations have been moved off the stack into
dynamically allocated working memory. This significantly reduces the peak
stack usage, allowing the compressor to run in systems with 128KB stack
limits.
* **Feature:** Builds now support `-DBLOCK_MAX_TEXELS=<count>` to allow a
compressor to support a subset of block sizes. This can reduce binary size
and runtime memory footprint, and improve performance.
* **Feature:** The `-v` and `-va` options to set a per-texel error weight
function are no longer supported.
* **Feature:** The `-b` option to set a per-texel error weight boost for
block border texels is no longer supported.
* **Feature:** The `-a` option to set a per-texel error weight based on texel
alpha value is no longer supported as an error weighting tool, but is still
supported for providing sprite-sheet RDO.
* **Feature:** The `-mask` option to set an error metric for mask map
textures is still supported, but is currently a no-op in the compressor.
* **Feature:** The `-perceptual` option to set a perceptual error metric is
still supported, but is currently a no-op in the compressor for mask map
and normal map textures.
* **Bug-fix:** Corrected decompression of error blocks in some cases, so now
returning the expected error color (magenta for LDR, NaN for HDR). Note
that astcenc determines the error color to use based on the output image
data type not the decoder profile.
* **Binary releases:**
* **Improvement:** Windows binaries changed to use ClangCL 12.0, which gives
up to 10% performance improvement.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 3.3 release:**
![Relative scores 3.4 vs 3.3](./ChangeLogImg/relative-3.3-to-3.4.png)
<!-- ---------------------------------------------------------------------- -->
## 3.3
**Status:** November 2021
The 3.3 release improves image quality for normal maps, and two component
textures. Normal maps are expected to compress 25% slower than the 3.2
release, although it should be noted that they are still faster to compress
in 3.3 than when using the 2.5 series. This release also fixes one reported
stability issue.
* **General:**
* **Feature:** Normal map image quality has been improved.
* **Feature:** Two component image quality has been improved, provided
that unused components are correctly zero-weighted using e.g. `-cw` on the
command line.
* **Bug-fix:** Improved stability when trying to compress complex blocks that
could not beat even the starting quality threshold. These will now always
compress in to a constant color blocks.
<!-- ---------------------------------------------------------------------- -->
## 3.2
**Status:** August 2021
The 3.2 release is a bugfix release; no significant image quality or
performance differences are expected.
* **General:**
* **Bug-fix:** Improved stability when new contexts were created while other
contexts were compressing or decompressing an image.
* **Bug-fix:** Improved stability when decompressing blocks with invalid
block encodings.
<!-- ---------------------------------------------------------------------- -->
## 3.1
**Status:** July 2021
The 3.1 release gives another performance boost, typically between 5 and 20%
faster than the 3.0 release, as well as further incremental improvements to
image quality. A number of build system improvements make astcenc easier and
faster to integrate into other projects as a library, including support for
building universal binaries on macOS. Full change list is shown below.
Reminder for users of the library interface - the API is not designed to be
binary compatible across versions, and this release is not compatible with
earlier releases. Please update and rebuild your client-side code using the
updated `astcenc.h` header.
* **General:**
* **Feature:** RGB color data now supports `-perceptual` operation. The
current implementation is simple, weighting color channel errors by their
contribution to perceived luminance. This mimics the behavior of the human
visual system, which is most sensitive to green, then red, then blue.
* **Feature:** Codec supports a new low weight search mode, which is a
simpler weight assignment for encodings with a low number of weights in the
weight grid. The weight threshold can be overridden using the new
`-lowweightmodelimit` command line option.
* **Feature:** All platform builds now support building a native binary.
Native binaries automatically select the SIMD level based on the default
configuration of the compiler in use. Native binaries built on one machine
may use different SIMD options than native binaries build on another.
* **Feature:** macOS platform builds now support building universal binaries
containing both `x86_64` and `arm64` target support.
* **Feature:** Building the command line can be disabled when using as a
library in another project. Set `-DCLI=OFF` during the CMake configure
step.
* **Feature:** A standalone minimal example of the core codec API usage has
been added in the `./Utils/Example/` directory.
* **Core API:**
* **Feature:** Config flag `ASTCENC_FLG_USE_PERCEPTUAL` works for color data.
* **Feature:** Config option `tune_low_weight_count_limit` added.
* **Feature:** New heuristic added which prunes dual weight plane searches if
they are unlikely to help. This heuristic is not user controllable.
* **Feature:** Image quality has been improved. In general we see significant
improvements (up to 0.2dB) for high bitrate encodings (4x4, 5x4), and a
smaller improvement (up to 0.1dB) for lower bitrate encodings.
* **Bug fix:** Arm "none" SIMD builds could be invariant with other builds.
This fix has also been back-ported to the 2.x LTS branch.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 3.0 release:**
![Relative scores 3.1 vs 3.0](./ChangeLogImg/relative-3.0-to-3.1.png)
<!-- ---------------------------------------------------------------------- -->
## 3.0
**Status:** June 2021
The 3.0 release is the first in a series of updates to the compressor that are
making more radical changes than we felt we could make with the 2.x series.
The primary goals of the 3.x series are to keep the image quality ~static or
better compared to the 2.5 release, but continue to improve performance.
Reminder for users of the library interface - the API is not designed to be
binary compatible across versions, and this release is not compatible with
earlier releases. Please update and rebuild your client-side code using the
updated `astcenc.h` header.
* **General:**
* **Feature:** The code has been significantly cleaned up, with improved
comments, API documentation, function naming, and variable naming.
* **Core API:**
* **API Change:** The core APIs for `astcenc_compress_image()` and for
`astcenc_decompress_image()` now accept swizzle structures by `const`
pointer, instead of pass-by-value.
* **API Change:** Calling the `astcenc_compress_reset()` and the
`astcenc_decompress_reset()` functions between images is no longer required
if the context was created for use by a single thread.
* **Feature:** New heuristics have been added for controlling when to search
beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and
1 plane. The previous `tune_partition_early_out_limit` config option has
been removed, and replaced with two new options
`tune_2_partition_early_out_limit_factor` and
`tune_3_partition_early_out_limit_factor`. See command line help for more
detailed documentation.
* **Feature:** New heuristics have been added for controlling when to use
dual weight planes. The previous `tune_two_plane_early_out_limit` has been
renamed to`tune_2_plane_early_out_limit_correlation`. See command line help
for more detailed documentation.
* **Feature:** Support for using dual weight planes has been restricted to
single partition blocks; it rarely helps blocks with 2 or more partitions
and takes considerable compression search time.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 2.5 release:**
![Relative scores 3.0 vs 2.5](./ChangeLogImg/relative-2.5-to-3.0.png)
- - -
_Copyright © 2021-2022, Arm Limited and contributors. All rights reserved._
+416
View File
@@ -0,0 +1,416 @@
# 4.x series change log
This page summarizes the major functional and performance changes in each
release of the 4.x series.
All performance data on this page is measured on an Intel Core i5-9600K
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
<!-- ---------------------------------------------------------------------- -->
## 4.8.0
**Status:** May 2024
The 4.8.0 release is a minor maintenance release.
* **General:**
* **Bug fix:** Native builds on macOS will now correctly build for arm64 when
run outside of Rosetta on an Apple silicon device.
* **Bug fix:** Multiple small improvements to remove use of undefined
language behavior, to improve support for deployment using Emscripten.
* **Feature:** Builds using Clang can now build with undefined behavior
sanitizer by setting `-DASTCENC_UBSAN=ON` on the CMake configure line.
* **Feature:** Updated to Wuffs library 0.3.4, which ignores tRNS alpha
chunks for type 4 (LA) and 6 (RGBA) PNGs, to improve compatibility with
libpng.
<!-- ---------------------------------------------------------------------- -->
## 4.7.0
**Status:** January 2024
The 4.7.0 release is a major maintenance release, fixing rounding behavior in
the decompressor to match the Khronos specification. This fix includes the
addition of explicit support for optimizing for `decode_unorm8` rounding.
Reminder - the codec library API is not designed to be binary compatible across
versions. We always recommend rebuilding your client-side code using the
updated `astcenc.h` header.
* **General:**
* **Bug fix:** sRGB LDR decompression now uses the correct endpoint expansion
method to create the 16-bit RGB endpoint colors, and removes the previous
correction code from the interpolation function. This bug could result in
LSB bit flips relative to the standard specification.
* **Bug fix:** Decompressing to an 8-bit per component output image now
matches the `decode_unorm8` extension rounding rules. This bug could result
in LSB bit flips relative to the standard specification.
* **Bug fix:** Code now avoids using `alignas()` in the reference C
implementation, as the default `alignas(16)` is narrower than the
native minimum alignment requirement on some CPUs.
* **Feature:** Library configuration supports a new flag,
`ASTCENC_FLG_USE_DECODE_UNORM8`. This flag indicates that the image will be
used with the `decode_unorm8` decode mode. When set during compression
this allows the compressor to use the correct rounding when determining the
best encoding.
* **Feature:** Command line tool supports a new option, `-decode_unorm8`.
This option indicates that the image will be used with the `decode_unorm8`
decode mode. This option will automatically be set for decompression
(`-d*`) and trial (`-t*`) tool operation if the decompressed output image
is stored to an 8-bit per component file format. This option must be set
manually for compression (`-c*`) tool operation, as the desired decode mode
cannot be reliably determined.
* **Feature:** Library configuration supports a new optional progress
reporting callback to be specified. This is called during compression to
to allow interactive tooling use cases to display incremental progress. The
command line tool uses this feature to show compression progress unless
`-silent` is used.
<!-- ---------------------------------------------------------------------- -->
## 4.6.1
**Status:** November 2023
The 4.6.1 release is a minor maintenance release to fix a scaling bug on
large core count Windows systems.
* **General:**
* **Optimization:** Windows builds of the `astcenc` command line tool can now
use more than 64 cores on large core count systems. This change doubled
command line performance for `-exhaustive` compression when testing on an
96 core/192 thread system.
* **Feature:** Windows Arm64 native builds of the `astcenc` command line tool
are now included in the prebuilt release binaries.
<!-- ---------------------------------------------------------------------- -->
## 4.6.0
**Status:** November 2023
The 4.6.0 release retunes the compressor heuristics to give improvements to
performance for trivial losses to image quality. It also includes some minor
bug fixes and code quality improvements.
Reminder - the codec library API is not designed to be binary compatible across
versions. We always recommend rebuilding your client-side code using the updated
`astcenc.h` header.
* **General:**
* **Bug-fix:** Fixed context allocation for contexts allocated with the
`ASTCENC_FLG_DECOMPRESS_ONLY` flag.
* **Bug-fix:** Reduced use of `reinterpret_cast` in the core codec to
avoid strict aliasing violations.
* **Optimization:** `-medium` search quality no longer tests 4 partition
encodings for block sizes between 25 and 83 texels (inclusive). This
improves performance for a tiny drop in image quality.
* **Optimization:** `-thorough` and higher search qualities no longer test the
mode0 first search for block sizes between 25 and 83 texels (inclusive).
This improves performance for a tiny drop in image quality.
* **Optimization:** `TUNE_MAX_PARTITIONING_CANDIDATES` reduced from 32 to 8
to reduce the size of stack allocated data structures. This causes a tiny
drop in image quality for the `-verythorough` and `-exhaustive` presets.
<!-- ---------------------------------------------------------------------- -->
## 4.5.0
**Status:** June 2023
The 4.5.0 release is a maintenance release with small image quality
improvements, and a number of build system quality of life improvements.
* **General:**
* **Bug-fix:** Improved handling compiler arguments in CMake, including
consistent use of MSVC-style command line arguments for ClangCL.
* **Bug-fix:** Invariant Clang builds now use `-ffp-model=precise` with
`-ffp-contract=off` which is needed to restore invariance due to recent
changes in compiler defaults.
* **Change:** macOS binary releases are now distributed as a single universal
binary for all platforms.
* **Change:** Windows binary releases are now compiled with VS2022.
* **Change:** Invariant MSVC builds for VS2022 now use `/fp:precise` instead
of `/fp:strict`, which is is now possible because precise no longer implies
contraction. This should improve performance for MSVC builds.
* **Change:** Non-invariant Clang builds now use `-ffp-model=precise` with
`-ffp-contract=on`. This should improve performance on older Clang
versions which defaulted to no contraction.
* **Change:** Non-invariant MSVC builds for VS2022 now use `/fp:precise`
with `/fp:contract`. This should improve performance for MSVC builds.
* **Change:** CMake config variables now use an `ASTCENC_` prefix to add a
namespace and group options when the library is used in a larger project.
* **Change:** CMake config `ASTCENC_UNIVERSAL_BUILD` for building macOS
universal binaries has been improved to include the `x86_64h` slice for
AVX2 builds. Universal builds are now on by default for macOS, and always
include NEON (arm64), SSE4.1 (x86_64), and AVX2 (x86_64h) variants.
* **Change:** CMake config `ASTCENC_NO_INVARIANCE` has been inverted to
remove the negated option, and is now `ASTCENC_INVARIANCE` with a default
of `ON`. Disabling this option can substantially improve performance, but
images can different across platforms and compilers.
* **Optimization:** Color quantization and packing for LDR RGB and RGBA has
been vectorized to improve performance.
* **Change:** Color quantization for LDR RGB and RGBA endpoints will now try
multiple quantization packing methods, and pick the one with the lowest
endpoint encoding error. This gives a minor image quality improvement, for
no significant performance impact when combined with the vectorization
optimizations.
<!-- ---------------------------------------------------------------------- -->
## 4.4.0
**Status:** March 2023
The 4.4.0 release is a minor release with image quality improvements, a small
performance boost, and a few new quality-of-life features.
* **General:**
* **Change:** Core library no longer checks availability of required
instruction set extensions, such as SSE4.1 or AVX2. Checking compatibility
is now the responsibility of the caller. See `astcenccli_entry.cpp` for
an example of code performing this check.
* **Change:** Core library can be built as a shared object by setting the
`-DSHAREDLIB=ON` CMake option, resulting in e.g. `libastcenc-avx2-shared.so`.
Note that the command line tool is always statically linked.
* **Change:** Decompressed 3D images will now write one output file per
slice, if the target format is a 2D image format.
* **Change:** Command line errors print to stderr instead of stdout.
* **Change:** Color encoding uses new quantization tables, that now factor
in floating-point rounding if a distance tie is found when using the
integer quant256 value. This improves image quality for 4x4 and 5x5 block
sizes.
* **Optimization:** Partition selection uses a simplified line calculation
with a faster approximation. This improves performance for all block sizes.
* **Bug-fix:** Fixed missing symbol error in decompressor-only builds.
* **Bug-fix:** Fixed infinity handling in debug trace JSON files.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 4.3 release:**
![Relative scores 4.4 vs 4.3](./ChangeLogImg/relative-4.3-to-4.4.png)
<!-- ---------------------------------------------------------------------- -->
## 4.3.1
**Status:** January 2023
The 4.3.1 release is a minor maintenance release. No performance or image
quality changes are expected.
* **General:**
* **Bug-fix:** Fixed typo in `-2/3/4partitioncandidatelimit` CLI options.
* **Bug-fix:** Fixed handling for `-3/4partitionindexlimit` CLI options.
* **Bug-fix:** Updated to `stb_image.h` v2.28, which includes multiple fixes
and improvements for image loading.
<!-- ---------------------------------------------------------------------- -->
## 4.3.0
**Status:** January 2023
The 4.3.0 release is an optimization release. There are minor performance
and image quality improvements in this release.
Reminder - the codec library API is not designed to be binary compatible across
versions. We always recommend rebuilding your client-side code using the updated
`astcenc.h` header.
* **General:**
* **Bug-fix:** Use lower case `windows.h` include for MinGW compatibility.
* **Change:** The `-mask` command line option, `ASTCENC_FLG_MAP_MASK` in the
library API, has been removed.
* **Optimization:** Always skip blue-contraction for `QUANT_256` encodings.
This gives a small image quality improvement for the 4x4 block size.
* **Optimization:** Always skip RGBO vector calculation for LDR encodings.
* **Optimization:** Defer color packing and scrambling to physical layer.
* **Optimization:** Remove folded `decimation_info` lookup tables. This
significantly reduces compressor memory footprint and improves context
creation time. Impact increases with the active block size.
* **Optimization:** Increased trial and refinement pruning by using stricter
target errors when determining whether to skip iterations.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 4.2 release:**
![Relative scores 4.3 vs 4.2](./ChangeLogImg/relative-4.2-to-4.3.png)
<!-- ---------------------------------------------------------------------- -->
## 4.2.0
**Status:** November 2022
The 4.2.0 release is an optimization release. There are significant performance
improvements, minor image quality improvements, and library interface changes in
this release.
Reminder - the codec library API is not designed to be binary compatible across
versions. We always recommend rebuilding your client-side code using the updated
`astcenc.h` header.
* **General:**
* **Bug-fix:** Compression for RGB and RGBA base+offset encodings no
longer generate endpoints with the incorrect blue-contract behavior.
* **Bug-fix:** Lowest channel correlation calculation now correctly ignores
constant color channels for the purposes of filtering 2 plane encodings.
On average this improves both performance and image quality.
* **Bug-fix:** ISA compatibility now checked in `config_init()` as well as
in `context_alloc()`.
* **Change:** Removed the low-weight count optimization, as more recent
changes had significantly reduced its performance benefit. Option removed
from both command line and configuration structure.
* **Feature:** The `-exhaustive` mode now runs full trials on more
partitioning candidates and block candidates. This improves image quality
by 0.1 to 0.25 dB, but slows down compression by 3x. The `-verythorough`
and `-thorough` modes also test more candidates.
* **Feature:** A new preset, `-verythorough`, has been introduced to provide
a standard performance point between `-thorough` and the re-tuned
`-exhaustive` mode. This new mode is faster and higher quality than the
`-exhaustive` preset in the 4.1 release.
* **Feature:** The compressor can now independently vary the number of
partitionings considered for error estimation for 2/3/4 partitions. This
allows heuristics to put more effort into 2 partitions, and less in to
3/4 partitions.
* **Feature:** The compressor can now run trials on a variable number of
candidate partitionings, allowing high quality modes to explore more of the
search space at the expense of slower compression. The number of trials is
independently configurable for 2/3/4 partition cases.
* **Optimization:** Introduce early-out threshold for 2/3/4 partition
searches based on the results after 1 of 2 trials. This significantly
improves performance for `-medium` and `-thorough` searches, for a minor
loss in image quality.
* **Optimization:** Reduce early-out threshold for 3/4 partition searches
based on 2/3 partition results. This significantly improves performance,
especially for `-thorough` searches, for a minor loss in image quality.
* **Optimization:** Use direct vector compare to create a SIMD mask instead
of a scalar compare that is broadcast to a vector mask.
* **Optimization:** Remove obsolete partition validity masks from the
partition selection algorithm.
* **Optimization:** Removed obsolete channel scaling from partition
`avgs_and_dirs()` calculation.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 4.0 and 4.1 release:**
![Relative scores 4.2 vs 4.0](./ChangeLogImg/relative-4.0-to-4.2.png)
<!-- ---------------------------------------------------------------------- -->
## 4.1.0
**Status:** August 2022
The 4.1.0 release is a maintenance release. There is no performance or image
quality change in this release.
* **General:**
* **Change:** Command line decompressor no longer uses the legacy
`GL_LUMINANCE` or `GL_LUMINANCE_ALPHA` format enums when writing KTX
output files. Luminance textures now use the `GL_RED` format and
luminance_alpha textures now use the `GL_RG` format.
* **Change:** Command line tool gains a new `-dimage` option to generate
diagnostic images showing aspects of the compression encoding. The output
file name with its extension stripped is used as the stem of the diagnostic
image file names.
* **Bug-fix:** Library decompressor builds for SSE no longer use masked store
`maskmovdqu` instructions, as they can generate faults on masked lanes.
* **Bug-fix:** Command line decompressor now correctly uses sized type enums
for the internal format when writing output KTX files.
* **Bug-fix:** Command line compressor now correctly loads 16 and 32-bit per
component input KTX files.
* **Bug-fix:** Fixed GCC9 compiler warnings on Arm aarch64.
<!-- ---------------------------------------------------------------------- -->
## 4.0.0
**Status:** July 2022
The 4.0.0 release introduces some major performance enhancement, and a number
of larger changes to the heuristics used in the codec to find a more effective
cost:quality trade off.
* **General:**
* **Change:** The `-array` option for specifying the number of image planes
for ASTC 3D volumetric block compression been renamed to `-zdim`.
* **Change:** The build root package directory is now `bin` instead of
`astcenc`, allowing the CMake install step to write binaries into
`/usr/local/bin` if the user wishes to do so.
* **Feature:** A new `-ssw` option for specifying the shader sampling swizzle
has been added as convenience alternative to the `-cw` option. This is
needed to correct error weighting during compression if not all components
are read in the shader. For example, to extract and compress two components
from an RGBA input image, weighting the two components equally when
sampling through .ra in the shader, use `-esw ggga -ssw ra`. In this
example `-ssw ra` is equivalent to the alternative `-cw 1 0 0 1` encoding.
* **Feature:** The `-a` alpha weighting option has been re-enabled in the
backend, and now again applies alpha scaling to the RGB error metrics when
encoding. This is based on the maximum alpha in each block, not the
individual texel alpha values used in the earlier implementation.
* **Feature:** The command line tool now has `-repeats <count>` for testing,
which will iterate around compression and decompression `count` times.
Reported performance metrics also now separate compression and
decompression scores.
* **Feature:** The core codec is now warning clean up to /W4 for both MSVC
`cl.exe` and `clangcl.exe` compilers.
* **Feature:** The core codec now supports arm64 for both MSVC `cl.exe` and
`clangcl.exe` compilers.
* **Feature:** `NO_INVARIANCE` builds will enable the `-ffp-contract=fast`
option for all targets when using Clang or GCC. In addition AVX2 targets
will also set the `-mfma` option. This reduces image quality by up to 0.2dB
(normally much less), but improves performance by up to 5-20%.
* **Optimization:** Angular endpoint min/max weight selection is restricted
to weight `QUANT_11` or lower. Higher quantization levels assume default
0-1 range, which is less accurate but much faster.
* **Optimization:** Maximum weight quantization for later trials is selected
based on the weight quantization of the best encoding from the 1 plane 1
partition trial. This significantly reduces the search space for the later
trials with more planes or partitions.
* **Optimization:** Small data tables now use in-register SIMD permutes
rather than gathers (AVX2) or unrolled scalar lookups (SSE/NEON). This can
be a significant optimization for paths that are load unit limited.
* **Optimization:** Decompressed image block writes in the decompressor now
use a vectorized approach to writing each row of texels in the block,
including to ability to exploit masked stores if the target supports them.
* **Optimization:** Weight scrambling has been moved into the physical layer;
the rest of the codec now uses linear order weights.
* **Optimization:** Weight packing has been moved into the physical layer;
the rest of the codec now uses unpacked weights in the 0-64 range.
* **Optimization:** Consistently vectorize the creation of unquantized weight
grids when they are needed.
* **Optimization:** Remove redundant per-decimation mode copies of endpoint
and weight structures, which were really read-only duplicates.
* **Optimization:** Early-out the same endpoint mode color calculation if it
cannot be applied.
* **Optimization:** Numerous type size reductions applied to arrays to reduce
both context working buffer size usage and stack usage.
### Performance:
Key for charts:
* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
**Relative performance vs 3.7 release:**
![Relative scores 4.0 vs 3.7](./ChangeLogImg/relative-3.7-to-4.0.png)
- - -
_Copyright © 2022-2024, Arm Limited and contributors. All rights reserved._
+105
View File
@@ -0,0 +1,105 @@
# 5.x series change log
This page summarizes the major functional and performance changes in each
release of the 5.x series.
All performance data on this page is measured on an Intel Core i5-9600K
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
<!-- ---------------------------------------------------------------------- -->
## 5.3.0
**Status:** March 2025
The 5.3.0 release is a minor maintenance release.
* **General:**
* **Feature:** Reference C builds (`ASTCENC_ISA_NONE`) now support compiling
for big-endian CPUs. Compile with `-DASTCENC_BIG_ENDIAN=ON` when compiling
for a big-endian target; it is not auto-detected.
* **Improvement:** Builds using GCC now specify `-flto=auto` to allow
parallel link steps, and remove the log warnings about not setting a CPU
count parameter value.
* **Bug fix:** Builds using MSVC `cl.exe` that do not specify an explicit
ISA using the preprocessor configuration defines will now correctly
default to the SSE2 backend on x86-64 and the NEON backend on Arm64. Previously they would have defaulted to the reference C implementation,
which is around 3.25 times slower.
<!-- ---------------------------------------------------------------------- -->
## 5.2.0
**Status:** February 2025
The 5.2.0 release is a minor maintenance release.
This release includes changes to the public interface in the `astcenc.h`
header. We always recommend rebuilding your client-side code using the
header from the same release to avoid compatibility issues.
* **General:**
* **Change:** Changed sRGB alpha channel endpoint expansion to match the
revised Khronos Data Format Specification (v1.4.0), which reverts an
unintended specification change. Compared to previous releases, this change
can cause LSB bit differences in the alpha channel of compressed images.
* **Feature:** Arm64 builds for Linux added to the GitHub Actions builds, and
Arm64 binaries for NEON, 128-bit SVE 128 and 256-bit SVE added to release
builds.
* **Feature:** Added a new codec API, `astcenc_compress_cancel()`, which can
be used to cancel an in-flight compression. This is designed to help make
it easier to integrate the codec into an interactive user interface that
can respond to user events with low latency.
* **Bug fix:** Removed incorrect `static` variable qualifier, which could
result in an incorrect `tune_mse_overshoot` heuristic threshold being used
if a user ran multiple concurrent compressions with different settings.
<!-- ---------------------------------------------------------------------- -->
## 5.1.0
**Status:** November 2024
The 5.1.0 release is an optimization release, giving moderate performance
improvements on all platforms. There are no image quality differences.
* **General:**
* **Feature:** Added a new CMake build option to control use of native
gathers, as they can be slower than scalar loads on some common x86
microarchitectures. Build with `-DASTCENC_X86_GATHERS=OFF` to disable use
of native gathers in AVX2 builds.
* **Optimization:** Added new `gather()` abstraction for gathers using byte
indices, allowing implementations without gather hardware to skip the
byte-to-int index conversion.
* **Optimization:** Optimized `compute_lowest_and_highest_weight()` to
pre-compute min/max outside of the main loop.
* **Optimization:** Added improved intrinsics sequence for SSE and AVX2
integer `hmin()` and `hmax()`.
* **Optimization:** Added improved intrinsics sequence for `vint4(uint8_t*)`
on systems implementing Arm SVE.
<!-- ---------------------------------------------------------------------- -->
## 5.0.0
**Status:** November 2024
The 5.0.0 release is the first stable release in the 5.x series. The main new
feature is support for the Arm Scalable Vector Extensions (SVE) SIMD instruction
set.
* **General:**
* **Bug fix:** Fixed incorrect return type in "None" vector library
reference implementation.
* **Bug fix:** Fixed sincos table index under/overflow.
* **Feature:** Changed `ASTCENC_ISA_NATIVE` builds to use `-march=native` and
`-mcpu=native`.
* **Feature:** Added backend for Arm SVE fixed-width 256-bit builds. These
can only run on hardware implementing 256-bit SVE.
* **Feature:** Added backend for Arm SVE 128-bit builds. These are portable
builds and can run on hardware implementing any SVE vector length, but the
explicit SVE use is augmented NEON and will only use the bottom 128-bits of
each SVE vector.
* **Feature:** Optimized NEON mask `any()` and `all()` functions.
* **Feature:** Migrated build and test to GitHub Actions pipelines.
- - -
_Copyright © 2022-2025, Arm Limited and contributors. All rights reserved._
Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 141 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 149 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 120 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 120 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 123 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 125 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 120 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

+235
View File
@@ -0,0 +1,235 @@
# Effective ASTC Encoding
Most texture compression schemes encode a single color format at single
bitrate, so there are relatively few configuration options available to content
creators beyond selecting which compressed format to use.
ASTC on the other hand is an extremely flexible container format which can
compress multiple color formats at multiple bit rates. Inevitably this
flexibility gives rise to questions about how to best use ASTC to encode a
specific color format, or what the equivalent settings are to get a close
match to another compression format.
This page aims to give some guidelines, but note that they are only guidelines
and are not exhaustive so please deviate from them as needed.
## Traditional format reference
The most commonly used non-ASTC compressed formats, their color format, and
their compressed bitrate are shown in the table below.
| Name | Color Format | Bits/Pixel | Notes |
| -------- | ------------ | ---------- | ---------------- |
| BC1 | RGB+A | 4 | RGB565 + 1-bit A |
| BC3 | RGB+A | 8 | BC1 RGB + BC4 A |
| BC3nm | G+R | 8 | BC1 G + BC4 R |
| BC4 | R | 4 | L8 |
| BC5 | R+G | 8 | BC1 R + BC1 G |
| BC6H | RGB (HDR) | 8 | |
| BC7 | RGB / RGBA | 8 | |
| EAC_R11 | R | 4 | R11 |
| EAC_RG11 | RG | 8 | RG11 |
| ETC1 | RGB | 4 | RGB565 |
| ETC2 | RGB+A | 4 | RGB565 + 1-bit A |
| ETC2+EAC | RGB+A | 8 | RGB565 + EAC A |
| PVRTC | RGBA | 2 or 4 | |
**Note:** BC2 (RGB+A) is not included in the table because it's rarely used in
practice due to poor quality alpha encoding; BC3 is nearly always used instead.
**Note:** Color representations shown with a `+` symbol indicate non-correlated
compression groups; e.g. an `RGB + A` format compresses `RGB` and `A`
independently and does not assume the two signals are correlated. This can be
a strength (it improves quality when compressing non-correlated signals), but
also a weakness (it reduces quality when compressing correlated signals).
# ASTC Format Mapping
The main question which arises with the mapping of another format on to ASTC
is how to handle cases where the input isn't a 4 component RGBA input. ASTC is
a container format which always decompresses in to a 4 component RGBA result.
However, the internal compressed representation is very flexible and can store
1-4 components as needed on a per-block basis.
To get the best quality for a given bitrate, or the lowest bitrate for a given
quality, it is important that as few components as possible are stored in the
internal representation to avoid wasting coding space.
Specific optimizations in the ASTC coding scheme exist for:
* Encoding the RGB components as a single luminance component, so only a single
value needs to be stored in the coding instead of three.
* Encoding the A component as a constant 1.0 value, so the coding doesn't
actually need to store a per-pixel alpha value at all.
... so mapping your inputs given to the compressor to hit these paths is
really important if you want to get the best output quality for your chosen
bitrate.
## Encoding 1-4 component data
The table below shows the recommended component usage for data with different
numbers of color components present in the data.
The coding swizzle should be applied when compressing an image. This can be
handled by the compressor when reading an uncompressed input image by
specifying the swizzle using the `-esw` command line option.
The sampling swizzle is what you should use in your shader programs to read
the data from the compressed texture, assuming no additional API-level
component swizzling is specified by the application.
| Input components | ASTC Endpoint | Coding Swizzle | Sampling Swizzle |
| -------------- | ------------- | -------------- | ------------------ |
| 1 | L + 1 | `rrr1` | `.g` <sup>1</sup> |
| 2 | L + A | `rrrg` | `.ga` <sup>1</sup> |
| 3 | RGB + 1 | `rgb1` | `.rgb` |
| 4 | RGB + A | `rgba` | `.rgba` |
**1:** Sampling from `g` is preferred to sampling from `r` because it allows a
single shader to be compatible with ASTC, BC1, or ETC formats. BC1 and ETC1
store color endpoints as RGB565 data, so the `g` component will have higher
precision. For ASTC it doesn't actually make any difference; the same single
component luminance will be returned for all three of the `.rgb` components.
## Equivalence with other formats
Based on these component encoding requirements we can now derive the the ASTC
coding equivalents for most of the other texture compression formats in common
use today.
| Formant | ASTC Coding Swizzle | ASTC Sampling Swizzle | Notes |
| -------- | ------------------- | --------------------- | ---------------- |
| BC1 | `rgba` <sup>1</sup> | `.rgba` | |
| BC3 | `rgba` | `.rgba` | |
| BC3nm | `gggr` | `.ag` | |
| BC4 | `rrr1` | `.r` | |
| BC5 | `rrrg` | `.ra` <sup>2</sup> | |
| BC6H | `rgb1` | `.rgb` <sup>3</sup> | HDR profile only |
| BC7 | `rgba` | `.rgba` | |
| EAC_R11 | `rrr1` | `.r` | |
| EAC_RG11 | `rrrg` | `.ra` <sup>2</sup> | |
| ETC1 | `rgb1` | `.rgb` | |
| ETC2 | `rgba` <sup>1</sup> | `.rgba` | |
| ETC2+EAC | `rgba` | `.rgba` | |
| ETC2+EAC | `rgba` | `.rgba` | |
**1:** ASTC has no equivalent of the 1-bit punch-through alpha encoding
supported by BC1 or ETC2; if alpha is present it will be a full alpha
component.
**2:** ASTC relies on using the L+A color endpoint type for coding efficiency
for two component data. It therefore has no direct equivalent of a two-plane
format sampled though the `.rg` components such as BC5 or EAC_RG11. This can
be emulated by setting texture component swizzles in the runtime API - e.g. via
`glTexParameteri()` for OpenGL ES - although it has been noted that API
controlled swizzles are not available in WebGL.
**3:** ASTC can only store unsigned values, and has no equivalent of the BC6
signed endpoint mode.
# Other Considerations
This section outlines some of the other things to consider when encoding
textures using ASTC.
## Decode mode extensions
ASTC is specified to decompress into a 16-bit per component RGBA output by
default, with the exception of the sRGB format which uses an 8-bit value for the
RGB components.
Decompressing in to a 16-bit per component output format is often higher than
many use cases require, especially for LDR textures which originally came from
an 8-bit per component source image. Most implementations of ASTC support the
decode mode extensions, which allow an application to opt-in to a lower
precision decompressed format (RGBA8 for LDR, RGB9E5 for HDR). Using these
extensions can improve GPU texture cache efficiency, and even improve texturing
filtering throughput, for use cases that do not need the higher precision.
The ASTC format uses different data rounding rules when the decode mode
extensions are used. To ensure that the compressor chooses the best encodings
for the RGBA8 rounding rules, you can specify `-decode_unorm8` when compressing
textures that will be decompressed into the RGBA8 intermediate. This gives a
small image quality boost.
**Note:** This mode is automatically enabled if you use the `astcenc`
decompressor to write an 8-bit per component output image.
## Encoding non-correlated components
Most other texture compression formats have a static component assignment in
terms of the expected data correlation. For example, ETC2+EAC assumes that RGB
are always correlated and that alpha is non-correlated. ASTC can automatically
encode data as either fully correlated across all 4 components, or with any one
component assigned to a separate non-correlated partition to the other three.
The non-correlated component can be changed on a block-by-block basis, so the
compressor can dynamically adjust the coding based on the data present in the
image. This means that there is no need for non-correlated data to be stored
in a specific component in the input image.
It is however worth noting that the alpha component is treated differently to
the RGB color components in some circumstances:
* When coding for sRGB the alpha component will always be stored in linear
space.
* When coding for HDR the alpha component can optionally be kept as LDR data.
## Encoding normal maps
The best way to store normal maps using ASTC is similar to the scheme used by
BC5; store the X and Y components of a unit-length normal. The Z component of
the normal can be reconstructed in shader code based on the knowledge that the
vector is unit length.
To encode this we need to store only two input components in the compressed
data, and therefore use the `rrrg` coding swizzle to align the data with the
ASTC luminance+alpha endpoint. We can sample this in shader code using the
`.ga` sampling swizzle, and reconstruct the Z value with:
vec3 nml;
nml.xy = texture(...).ga; // Load normals (range 0 to 1)
nml.xy = nml.xy * 2.0 - 1.0; // Unpack normals (range -1 to +1)
nml.z = sqrt(1 - dot(nml.xy, nml.xy)); // Compute Z, given unit length
The encoding swizzle and appropriate component weighting is enabled by using
the `-normal` command line option. If you wish to use a different pair of
components you can specify a custom swizzle after setting the `-normal`
parameter. For example, to match BC5n component ordering use
`-normal -esw gggr` for compression and `-normal -dsw arz1` for decompression.
## Encoding sRGB data
The ASTC LDR profile can compress sRGB encoded color, which is a more
efficient use of bits than storing linear encoded color because the gamma
corrected value distribution more closely matches human perception of
luminance.
For color data it is nearly always a perceptual quality win to use sRGB input
source textures that are then compressed using the ASTC sRGB compression mode
(compress using the `-cs` command line option rather than the `-cl` command
line option). Note that sRGB gamma correction is only applied to the RGB
components during decode; the alpha component is always treated as linear
encoded data.
*Important:* The uncompressed input texture provided on the command line must
be stored in the sRGB color space for `-cs` to function correctly.
## Encoding HDR data
HDR data can be encoded just like LDR data, but with some caveats around
handling the alpha component.
For many use cases the alpha component is an actual alpha opacity component and
is therefore used for storing an LDR value between 0 and 1. For these cases use
the `-ch` compressor option which will treat the RGB components as HDR, but the
A component as LDR.
For other use cases the alpha component is simply a fourth data component which
is also storing an HDR value. For these cases use the `-cH` compressor option
which will treat all components as HDR data.
- - -
_Copyright © 2019-2024, Arm Limited and contributors. All rights reserved._
+71
View File
@@ -0,0 +1,71 @@
# The .astc File Format
The default file format for compressed textures generated by `astcenc`, as well
as from many other ASTC compressors, is the `.astc` format. This is a very
simple format consisting of a small header followed immediately by the binary
payload for a single image surface.
Header
======
The header is a fixed 16 byte structure, defined as storing only bytes to avoid
any endianness issues or incur any padding overhead.
```
struct astc_header
{
uint8_t magic[4];
uint8_t block_x;
uint8_t block_y;
uint8_t block_z;
uint8_t dim_x[3];
uint8_t dim_y[3];
uint8_t dim_z[3];
};
```
Magic number
------------
The 4 byte magic number at the start of the file acts as a format identifier.
```
magic[0] = 0x13;
magic[1] = 0xAB;
magic[2] = 0xA1;
magic[3] = 0x5C;
```
Block size
----------
The `block_*` fields store the ASTC block dimensions in texels. For 2D images
the Z dimension must be set to 1.
Image dimensions
----------------
The `dim_*` fields store the image dimensions in texels. For 2D images the
Z dimension must be set to 1.
Note that the image is not required to be an exact multiple of the compressed
block size; the compressed data may include padding that is discarded during
decompression.
Each dimension is a 24 bit unsigned value that is reconstructed from the stored
byte values as:
```
decoded_dim = dim[0] + (dim[1] << 8) + (dim[2] << 16);
```
Binary payload
==============
The binary payload is a byte stream that immediately follows the header. It
contains 16 bytes per compressed block. The number of compressed blocks is
determined from the header information.
- - -
_Copyright © 2020-2022, Arm Limited and contributors. All rights reserved._
+488
View File
@@ -0,0 +1,488 @@
# ASTC Format Overview
Adaptive Scalable Texture Compression (ASTC) is an advanced lossy texture
compression technology developed by Arm and AMD. It has been adopted as an
official Khronos extension to the OpenGL and OpenGL ES APIs, and as a standard
optional feature for the Vulkan API.
ASTC offers a number of advantages over earlier texture compression formats:
* **Format flexibility:** ASTC supports compressing between 1 and 4 channels of
data, including support for one non-correlated channel such as RGB+A
(correlated RGB, non-correlated alpha).
* **Bit rate flexibility:** ASTC supports compressing images with a fine
grained choice of bit rates between 0.89 and 8 bits per texel (bpt). The bit
rate choice is independent to the color format choice.
* **Advanced format support:** ASTC supports compressing images in either low
dynamic range (LDR), LDR sRGB, or high dynamic range (HDR) color spaces, as
well as support for compressing 3D volumetric textures.
* **Improved image quality:** Despite the high degree of format flexibility,
ASTC manages to beat nearly all legacy texture compression formats -- such as
ETC2, PVRCT, and the BC formats -- on image quality at equivalent bit
rates.
This article explores the ASTC format, and how it manages to generate the
flexibility and quality improvements that it achieves.
Why ASTC?
=========
Before the creation of ASTC, the format and bit rate coverage of the available
formats was very sparse:
![Legacy texture compression formats and bit rates](./FormatOverviewImg/coverage-legacy.svg)
In reality the situation is even worse than this diagram shows, as many of
these formats are proprietary or simply not available on some operating
systems, so any single platform will have very limited compression choices.
For developers this situation makes developing content which is portable across
multiple platforms a tricky proposition. It's almost certain that differently
compressed assets will be needed for different platforms. Each asset pack would
likely then need to use different levels of compression, and may even have to
fall back to no compression for some assets on some platforms, which leaves
either some image quality or some memory bandwidth efficiency untapped.
It was clear a better way was needed, so the Khronos group asked members to
submit proposals for a new compression algorithm to be adopted in the same
manner that the earlier ETC algorithm was adopted for OpenGL ES. ASTC was the
result of this, and has been adopted as an official algorithm for OpenGL,
OpenGL ES, and Vulkan.
Format overview
===============
Given the fragmentation issues with the existing compression formats, it should
be no surprise that the high level design objectives for ASTC were to have
something which could be used across the whole range of art assets found in
modern content, and which allows artists to have more control over the quality
to bit rate tradeoff.
There are quite a few technical components which make up the ASTC format, so
before we dive into detail it will be useful to give an overview of how ASTC
works at a higher level.
Block compression
-----------------
Compression formats for real-time graphics need the ability to quickly and
efficiently make random samples into a texture. This places two technical
requirements on any compression format:
* It must be possible to compute the address of data in memory given only a
sample coordinate.
* It must be possible to decompress random samples without decompressing too
much surrounding data.
The standard solution for this used by all contemporary real-time formats,
including ASTC, is to divide the image into fixed-size blocks of texels, each
of which is compressed into a fixed number of output bits. This feature makes
it possible to access texels quickly, in any order, and with a well-bounded
decompression cost.
The 2D block footprints in ASTC range from 4x4 texels up to 12x12 texels, which
all compress into 128-bit output blocks. By dividing 128 bits by the number of
texels in the footprint, we derive the format bit rates which range from 8 bpt
(`128/(4*4)`) down to 0.89 bpt (`128/(12*12)`).
Color encoding
--------------
ASTC uses gradients to assign the color values of each texel. Each compressed
block stores the end-point colors for a gradient, and an interpolation weight
for each texel which defines the texel's location along that gradient. During
decompression the color value for each texel is generated by interpolating
between the two end-point colors, based on the per-texel weight.
![One partition gradient storage](./FormatOverviewImg/gradient-1p.svg)
In many cases a block will contain a complex distribution of colors, for
example a red ball sitting on green grass. In these scenarios a single color
gradient will not be able to accurately represent all of the texels' values. To
support this ASTC allows a block to define up to four distinct color gradients,
known as partitions, and can assign each texel to a single partition. For our
example we require two partitions, one for our ball texels and one for our
grass texels.
![Two partition gradient storage](./FormatOverviewImg/gradient-2p.svg)
Now that you know the high level operation of the format, we can dive into more
detail.
Integer encoding
================
Initially the idea of fractional bits per texel sounds implausible, or even
impossible, because we're so used to storing numbers as a whole number of bits.
However, it's not quite as strange as it sounds. ASTC uses an encoding
technique called Bounded Integer Sequence Encoding (BISE), which makes heavy
use of storing numbers with a fractional number of bits to pack information
more efficiently.
Storing alphabets
-----------------
Even though color and weight values per texel are notionally floating-point
values, we have far too few bits available to directly store the actual values,
so they must be quantized during compression to reduce the storage size. For
example, if we have a floating-point weight for each texel in the range 0.0 to
1.0 we could choose to quantize it to five values - 0.0, 0.25, 0.5, 0.75, and
1.0 - which we can then represent in storage using the integer values 0 to 4.
In the general case we need to be able to efficiently store characters of an
alphabet containing N symbols if we choose quantize to N levels. An N symbol
alphabet contains `log2(N)` bits of information per character. If we have an
alphabet of 5 possible symbols then each character contains ~2.32 bits of
information, but simple binary storage would require us to round up to 3 bits.
This wastes 22.3% of our storage capacity. The chart below shows the percentage
of our bit-space wasted when using simple binary encoding to store an arbitrary
N symbol alphabet:
![Binary encoding efficiency](./FormatOverviewImg/binary.png)
... which shows for most alphabet sizes we waste a lot of our storage capacity
when using an integer number of bits per character. Efficiency is of critical
importance to a compression format, so this is something we needed to be able
to improve.
**Note:** We could have chosen to round-up the quantization level to the next
power of two, and at least use the bits we're spending. However, this forces
the encoder to spend bits which could be used elsewhere for a bigger benefit,
so it will reduce image quality and is a sub-optimal solution.
Quints
------
Instead of rounding up a 5 symbol alphabet - called a "quint" in BISE - to
three bits, we could choose to instead pack three quint characters together.
Three characters in a 5-symbol alphabet have 5<sup>3</sup> (125) combinations,
and contain 6.97 bits of information. We can store this in 7 bits and have a
storage waste of only 0.5%.
Trits
-----
We can similarly construct a 3-symbol alphabet - called a "trit" in BISE - and
pack trit characters in groups of five. Each character group has 3<sup>5</sup>
(243) combinations, and contains 7.92 bits of information. We can store this in
8 bits and have a storage waste of only 1%.
BISE
----
The BISE encoding used by ASTC allows storage of character sequences using
arbitrary alphabets of up to 256 symbols, encoding each alphabet size in the
most space-efficient choice of bits, trits, and quints.
* Alphabets with up to (2<sup>n</sup> - 1) symbols can be encoded using n bits
per character.
* Alphabets with up (3 * 2<sup>n</sup> - 1) symbols can be encoded using n bits
(m) and a trit (t) per character, and reconstructed using the equation
(t * 2<sup>n</sup> + m).
* Alphabets with up to (5 * 2<sup>n</sup> - 1) symbols can be encoded using n
bits (m) and a quint (q) per character, and reconstructed using the equation
(q * 2<sup>n</sup> + m).
When the number of characters in a sequence is not a multiple of three or five
we need to avoid wasting storage at the end of the sequence, so we add another
constraint on the encoding. If the last few values in the sequence to encode
are zero, the last few bits in the encoded bit string must also be zero.
Ideally, the number of non-zero bits should be easily calculated and not depend
on the magnitudes of the previous encoded values. This is a little tricky to
arrange during compression, but it is possible. This means that we do not need
to store any padding after the end of the bit sequence, as we can safely assume
that they are zero bits.
With this constraint in place - and by some smart packing the bits, trits, and
quints - BISE encodes an string of S characters in an N symbol alphabet using a
fixed number of bits:
* S values up to (2<sup>n</sup> - 1) uses (NS) bits.
* S values up to (3 * 2<sup>n</sup> - 1) uses (NS + ceil(8S / 5)) bits.
* S values up to (5 * 2<sup>n</sup> - 1) uses (NS + ceil(7S / 3)) bits.
... and the compressor will choose the one of these which produces the smallest
storage for the alphabet size being stored; some will use binary, some will use
bits and a trit, and some will use bits and a quint. If we compare the storage
efficiency of BISE against simple binary for the range of possible alphabet
sizes we might want to encode we can see that it is much more efficient.
![BISE encoding efficiency](./FormatOverviewImg/bise.png)
Block sizes
===========
ASTC always compresses blocks of texels into 128-bit outputs, but allows the
developer to select from a range of block sizes to enable a fine-grained
tradeoff between image quality and size.
| Block footprint | Bits/texel | | Block footprint | Bits/texel |
| --------------- | ---------- | --- | --------------- | ---------- |
| 4x4 | 8.00 | | 10x5 | 2.56 |
| 5x4 | 6.40 | | 10x6 | 2.13 |
| 5x5 | 5.12 | | 8x8 | 2.00 |
| 6x5 | 4.27 | | 10x8 | 1.60 |
| 6x6 | 3.56 | | 10x10 | 1.28 |
| 8x5 | 3.20 | | 12x10 | 1.07 |
| 8x6 | 2.67 | | 12x12 | 0.89 |
Color endpoints
===============
The color data for a block is encoded as a gradient between two color
endpoints, with each texel selecting a position along that gradient which is
then interpolated during decompression. ASTC supports 16 color endpoint
encoding schemes, known as "endpoint modes". Options for endpoint modes
include:
* Varying the number of color channels: e.g. luminance, luminance + alpha, rgb,
and rgba.
* Varying the encoding method: e.g. direct, base+offset, base+scale,
quantization level.
* Varying the data range: e.g. low dynamic range, or high dynamic range
The endpoint modes, and the endpoint color BISE quantization level, can be
chosen on a per-block basis.
Color partitions
================
Colors within a block are often complex, and cannot be accurately captured by a
single color gradient, as discussed earlier with our example of a red ball
lying on green grass. ASTC allows up to four color gradients - known as
"partitions" - to be assigned to a single block. Each texel is then assigned to
a single partition for the purposes of decompression.
Rather then directly storing the partition assignment for each texel, which
would need a lot of decompressor hardware to store it for all block sizes, we
generate it procedurally. Each block only needs to store the partition index -
which is the seed for the procedural generator - and the per texel assignment
can then be generated on-the-fly during decompression. The image below shows
the generated texel assignments for two (top), three (middle), and four
(bottom) partitions for the 8x8 block size.
![ASTC partition table](./FormatOverviewImg/hash.png)
The number of partitions and the partition index can be chosen on a per-block
basis, and a different color endpoint mode can be chosen per partition.
**Note:** ASTC uses a 10-bit seed to drive the partition assignments. The hash
used will introduce horizontal bias in a third of the partitions, vertical bias
in a third, and no bias in the rest. As they are procedurally generated not all
of the partitions are useful, in particular with the smaller block sizes.
* Many partitions are duplicates.
* Many partitions are degenerate (an N partition hash results in at least one
partition assignment that contains no texels).
Texel weights
=============
Each texel requires a weight, which defines the relative contribution of each
color endpoint when interpolating the color gradient.
For smaller block sizes we can choose to store the weight directly, with one
weight per texel, but for the larger block sizes we simply do not have enough
bits of storage to do this. To work around this ASTC allows the weight grid to
be stored at a lower resolution than the texel grid. The per-texel weights are
interpolated from the stored weight grid during decompression using a bilinear
interpolation.
The number of texel weights, and the weight value BISE quantization level, can
be chosen on a per-block basis.
Dual-plane weights
------------------
Using a single weight for all color channels works well when there is good
correlation across the channels, but this is not always the case. Common
examples where we would expect to get low correlation at least some of the time
are textures storing RGBA data - alpha masks are not usually closely
correlated with the color value - or normal data - the X and Y normal values
often change independently.
ASTC allows a dual-plane mode, which uses two separate weight grids for each
texel. A single channel can be assigned to a second plane of weights, while
the other three use the first plane of weights.
The use of dual-plane mode can be chosen on a per-block basis, but its use
prevents the use of four color partitions as we do not have enough bits to
concurrently store both an extra plane of weights and an extra set of color
endpoints.
End results
===========
So, if we pull all of this together what do we end up with?
Adaptive
--------
The first word in the name of ASTC is "adaptive", and it should now hopefully
be clear why. Each block always compresses into 128-bits of storage, but the
developer can choose from a wide range of texel block sizes and the compressor
gets a huge amount of latitude to determine how those 128 bits are used.
The compressor can trade off the number of bits assigned to colors (number of
partitions, endpoint mode, and stored quantization level) and weights (number
of weights per block, use of dual-plane, and stored quantization level) on a
per-block basis to get the best image quality possible.
![ASTC compressed parrot at various bit rates](./FormatOverviewImg/astc-quality.png)
Format support
--------------
The compression scheme used by ASTC effectively compresses arbitrary sequences
of floating point numbers, with a flexible number of channels, across any of
the supported block sizes. There is no real notion of "color format" in the
format itself at all, beyond the color endpoint mode selection, although a
sensible compressor will want to use some format-specific heuristics to drive
an efficient state-space search.
The orthogonal encoding design allows ASTC to provide almost complete coverage
of our desirable format matrix from earlier, across a wide range of bit rates:
![ASTC 2D formats and bit rates](./FormatOverviewImg/coverage-astc.svg)
The only significant omission is the absence of a dedicated two channel
encoding for HDR textures. We simply ran out of entries in the space we had for
encoding color endpoint modes, and this one didn't make the cut.
The flexibility allowed by ASTC ticks the requirement that almost any asset can
be compressed to some degree, at an appropriate bitrate for its quality needs.
This is a powerful enabler for a compression format, because it puts control in
the hands of content creators and not arbitrary format restrictions.
Image quality
-------------
The normal expectation would be that this level of format flexibility would
come at a cost of image quality; it has to cost something, right? Luckily this
isn't true. The high packing efficiency allowed by BISE encoding, and the
ability to dynamically choose where to spend encoding space on a per-block
basis, means that an ASTC compressor is not forced to spend bits on things that
don't help image quality.
This gives some significant improvements in image quality compared to the older
texture formats, even though ASTC also handles a much wider range of options.
* ASTC at 2 bpt outperforms PVRTC at 2 bpt by ~2.0dB.
* ASTC at 3.56 bpt outperforms PVRTC and BC1 at 4 bpt by ~1.5dB, and ETC2 by
~0.7dB, despite a 10% bit rate disadvantage.
* ASTC at 8 bpt for LDR formats is comparable in quality to BC7 at 8 bpt.
* ASTC at 8 bpt for HDR formats is comparable in quality to BC6H at 8 bpt.
Differences as small as 0.25dB are visible to the human eye, and remember that
dB uses a logarithmic scale, so these are significant image quality
improvements.
3D compression
--------------
One of the nice bonus features of ASTC is that the techniques which underpin
the format generalize to compressing volumetric texture data without needing
very much additional decompression hardware.
ASTC is therefore also able to optionally support compression of 3D textures,
which is a unique feature not found in any earlier format, at the following
bit rates:
| Block footprint | Bits/texel | | Block footprint | Bits/texel |
| --------------- | ---------- | --- | --------------- | ---------- |
| 3x3x3 | 4.74 | | 5x5x4 | 1.28 |
| 4x3x3 | 3.56 | | 5x5x5 | 1.02 |
| 4x4x3 | 2.67 | | 6x5x5 | 0.85 |
| 4x4x4 | 2.00 | | 6x6x5 | 0.71 |
| 5x4x4 | 1.60 | | 6x6x6 | 0.59 |
Availability
============
The ASTC functionality is specified as a set of feature profiles, allowing
GPU hardware manufacturers to select which parts of the standard they
implement. There are four commonly seen profiles:
* "LDR":
* 2D blocks.
* LDR and sRGB color space.
* [KHR_texture_compression_astc_ldr][astc_ldr]: KHR OpenGL ES extension.
* "LDR + Sliced 3D":
* 2D blocks and sliced 3D blocks.
* LDR and sRGB color space.
* [KHR_texture_compression_astc_sliced_3d][astc_3d]: KHR OpenGL ES extension.
* "HDR":
* 2D and sliced 3D blocks.
* LDR, sRGB, and HDR color spaces.
* [KHR_texture_compression_astc_hdr][astc_ldr]: KHR OpenGL ES extension.
* "Full":
* 2D, sliced 3D, and volumetric 3D blocks.
* LDR, sRGB, and HDR color spaces.
* [OES_texture_compression_astc][astc_full]: OES OpenGL ES extension.
The LDR profile is mandatory in OpenGL ES 3.2 and a standardized optional
feature for Vulkan, and therefore widely supported on contemporary mobile
devices. The 2D HDR profile is not mandatory, but is widely supported.
3D texturing
------------
The APIs expose 3D textures in two flavors.
The sliced 3D texture support builds a 3D texture from an array of 2D image
slices that have each been individually compressed using 2D ASTC compression.
This is required for the HDR profile, so is also widely supported.
The volumetric 3D texture support uses the native 3D block sizes provided by
ASTC to implement true volumetric compression. This enables a wider choice of
low bitrate options than the 2D blocks, which is particularly important for 3D
textures of any non-trivial size. Volumetric formats are not widely supported,
but are supported on all of the Arm Mali GPUs that support ASTC.
ASTC decode mode
----------------
ASTC is specified to decompress texels into fp16 intermediate values, except
for sRGB which always decompresses into 8-bit UNORM intermediates. For many use
cases this gives more dynamic range and precision than required. This can cause
a reduction in both texture cache efficiency and texture filtering performance
due to the larger decompressed data size.
A pair of extensions exist, and are widely supported on recent mobile GPUs,
which allow applications to reduce the intermediate precision to either UNORM8
(recommended for LDR textures) or RGB9e5 (recommended for HDR textures).
* [OES_texture_compression_astc_decode_mode][astc_decode]: Allow UNORM8
intermediates
* [OES_texture_compression_astc_decode_mode_rgb9e5][astc_decode]: Allow RGB9e5
intermediates
[astc_ldr]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_hdr.txt
[astc_3d]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_sliced_3d.txt
[astc_full]: https://www.khronos.org/registry/OpenGL/extensions/OES/OES_texture_compression_astc.txt
[astc_decode]: https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_texture_compression_astc_decode_mode.txt
- - -
_Copyright © 2019-2022, Arm Limited and contributors. All rights reserved._
Binary file not shown.

After

Width:  |  Height:  |  Size: 115 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 122 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 76 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 55 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

+79
View File
@@ -0,0 +1,79 @@
# Terminology for the ASTC Encoder
Like most software, the `astcenc` code base has a set of naming conventions
for variables which are used to ensure both accuracy and reasonable brevity.
:construction: These conventions are being used for new patches, so new code
will conform to this, but older code is still being cleaned up to follow
these conventions.
## Counts
For counts of things prefer `<x>_count` rather than `<x>s`. For example:
* `plane_count`
* `weight_count`
* `texel_count`
Where possible aim for descriptive loop variables, as these are more literate
than simple `i` or `j` variables. For example:
* `plane_index`
* `weight_index`
* `texel_index`
## Ideal, Unpacked Quantized, vs Packed Quantized
Variables that are quantized, such as endpoint colors and weights, have
multiple states depending on how they are being used.
**Ideal values** represent arbitrary numeric values that can take any value.
These are often used during compression to work out the best value before
any quantization is applied. For example, integer weights in the 0-64 range can
take any of the 65 values available.
**Quant uvalues** represent the unpacked numeric value after any quantization
rounding has been applied. These are often used during compression to work out
the error for the quantized value compared to the ideal value. For example,
`QUANT_3` weights in the 0-64 range can only take one of `[0, 32, 64]`.
**Quant pvalues** represent the packed numeric value in the quantized alphabet.
This is what ends up encoded in the ASTC data, although note that the encoded
ordering is scrambled to simplify hardware. For example, `QUANT_3` weights
originally in the 0-64 range can only take one of `[0, 1, 2]`.
For example:
* `weights_ideal_value`
* `weights_quant_uvalue`
* `weights_quant_pvalue`
## Full vs Decimated interpolation weights
Weight grids have multiple states depending on how they are being used.
**full_weights** represent per texel weight grids, storing one weight per texel.
**decimated_weights** represent reduced weight grids, which can store fewer
weights and which are bilinear interpolated to generate the full weight grid.
Full weights have no variable prefix,but decimated weights are stored with
a `dec_` prefix.
* `dec_weights_ideal_value`
* `dec_weights_quant_uvalue`
* `dec_weights_quant_pvalue`
## Weight vs Significance
The original encoder used "weight" for multiple purposes - texel significance
(weight the error), color channel significance (weight the error), and endpoint
interpolation weights. This gets very confusing in functions using all three!
We are slowly refactoring the code to only use "weight" to mean the endpoint
interpolation weights. The error weighting factors used for other purposes are
being updated to use the using the term "significance".
- - -
_Copyright © 2020-2022, Arm Limited and contributors. All rights reserved._
+120
View File
@@ -0,0 +1,120 @@
# Testing astcenc
The repository contains a small suite of tests which can be used to sanity
check source code changes to the compressor. It must be noted that this test
suite is relatively limited in scope and does not cover every feature or
bitrate of the standard.
# Required software
Running the tests requires Python 3.7 to be installed on the host machine, and
an `astcenc-avx2` release build to have been previously compiled and installed
into an directory called `astcenc` in the root of the git checkout. This
can be achieved by configuring the CMake build using the install prefix
`-DCMAKE_INSTALL_PREFIX=../` and then running a build with the `install` build
target.
# Running C++ unit tests
We support a small (but growing) number of C++ unit tests, which are written
using the `googletest` framework and integrated in the CMake "CTest" test
framework.
To build unit tests pull the `googletest` git submodule and add
`-DASTCENC_UNITTEST=ON` to the CMake command line when configuring.
To run unit tests use the CMake `ctest` utility from your build directory after
you have built the tests.
```shell
cd build
ctest --verbose
```
# Running command line tests
To run the command line tests, which aim to get coverage of the command line
options and core codec stability without testing the compression quality
itself, run the command line:
python3 -m unittest discover -s Test -p astc_test*.py -v
# Running image tests
To run the image test suite run the following command from the root directory
of the repository:
python3 ./Test/astc_test_image.py
This will run though a series of image compression tests, comparing the image
PSNR against a set of reference results from the last stable baseline. The test
will fail if any reduction in PSNR above a set threshold is detected. Note that
performance information is reported, but regressions will not flag a failure.
For debug purposes, all decompressed test output images and result CSV files
are stored in the `TestOutput` directory, using the same test set structure as
the `Test/Images` folder.
## Test selection
The runner supports a number of options to filter down what is run, enabling
developers to focus local testing on the parts of the code they are working on.
* `--encoder` selects which encoder to run. By default the `avx2` encoder is
selected. Note that some out-of-tree reference encoders (older encoders, and
some third-party encoders) are supported for comparison purposes. These will
not work without the binaries being manually provided; they are not
distributed here.
* `--test-set` selects which image set to run. By default the `Small` image
test set is selected, which aims to provide basic coverage of many different
color formats and color profiles.
* `--block-size` selects which block size to run. By default a range of
block sizes (2D and 3D) are used.
* `--color-profile` selects which color profiles from the standard should be
used (LDR, LDR sRGB, or HDR) to select images. By default all are selected.
* `--color-format` selects which color formats should be used (L, XY, RGB,
RGBA) to select images. By default all are selected.
## Performance tests
To provide less noisy performance results the test suite supports compressing
each image multiple times and returning the best measured performance. To
enable this mode use the following options:
* `--repeats <M>` : Run M test compression passes which are timed.
**Note:** The reference CSV contains performance results measured on an Intel
Core i5 9600K running at 4.3GHz, running each test 5 times.
## Updating reference data
The reference PSNR and performance scores are stored in CSVs committed to the
repository. This data is created by running the tests using the last stable
release on a standard test machine we use for performance testing builds.
It can be useful for developers to rebuild the reference results for their
local machine, in particular for measuring performance improvements. To build
new reference CSVs, download the current reference `astcenc` binary (1.7) from
GitHub for your host OS and place it in to the `./Binaries/1.7/` directory.
Once this is done, run the command:
python3 ./Test/astc_test_image.py --encoder 1.7 --test-set all --repeats 5
... to regenerate the reference CSV files.
**WARNING:** This can take some hours to complete, and it is best done when the
test suite gets exclusive use of the machine to avoid other processing slowing
down the compression and disturbing the performance data. It is recommended to
shutdown or disable any background applications that are running.
## Valgrind memcheck
It is always worth running the Valgrind memcheck tool to validate that we have
not introduced any obvious memory errors. Build a release build with symbols
information with `-DCMAKE_BUILD_TYPE=RelWithDebInfo` and then run:
valgrind --tool=memcheck --track-origins=yes <command>
- - -
_Copyright © 2019-2022, Arm Limited and contributors. All rights reserved._