Add ktx
@@ -0,0 +1,315 @@
|
||||
# Building ASTC Encoder
|
||||
|
||||
This page provides instructions for building `astcenc` from the sources in
|
||||
this repository.
|
||||
|
||||
Builds must use CMake 3.15 or higher as the build system generator. The
|
||||
examples on this page show how to use it to generate build systems for NMake
|
||||
(Windows) and Make (Linux and macOS), but CMake supports other build system
|
||||
backends.
|
||||
|
||||
## Windows
|
||||
|
||||
Builds for Windows are tested with CMake 3.17, and Visual Studio 2019 or newer.
|
||||
|
||||
### Configuring the build
|
||||
|
||||
To use CMake you must first configure the build. Create a build directory in
|
||||
the root of the `astcenc` checkout, and then run `cmake` inside that directory
|
||||
to generate the build system.
|
||||
|
||||
```shell
|
||||
# Create a build directory
|
||||
mkdir build
|
||||
cd build
|
||||
|
||||
# Configure your build of choice, for example:
|
||||
|
||||
# x86-64 using a Visual Studio solution
|
||||
cmake -G "Visual Studio 16 2019" -T ClangCL -DCMAKE_INSTALL_PREFIX=..\ ^
|
||||
-DASTCENC_ISA_AVX2=ON -DASTCENC_ISA_SSE41=ON -DASTCENC_ISA_SSE2=ON ..
|
||||
|
||||
# x86-64 using NMake
|
||||
cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=..\ ^
|
||||
-DASTCENC_ISA_AVX2=ON -DASTCENC_ISA_SSE41=ON -DASTCENC_ISA_SSE2=ON ..
|
||||
```
|
||||
|
||||
A single CMake configure can build multiple binaries for a single target CPU
|
||||
architecture, for example building x64 for both SSE2 and AVX2. Each binary name
|
||||
will include the build variant as a postfix. It is possible to build any set of
|
||||
the supported SIMD variants by enabling only the ones you require.
|
||||
|
||||
Using the Visual Studio Clang-CL LLVM toolchain (`-T ClangCL`) is optional but
|
||||
produces significantly faster binaries than the default toolchain. The C++ LLVM
|
||||
toolchain component must be installed via the Visual Studio installer.
|
||||
|
||||
### Building
|
||||
|
||||
Once you have configured the build you can use NMake to compile the project
|
||||
from your build dir, and install to your target install directory.
|
||||
|
||||
```shell
|
||||
# Run a build and install build outputs in `${CMAKE_INSTALL_PREFIX}/bin/`
|
||||
cd build
|
||||
nmake install
|
||||
```
|
||||
|
||||
## macOS and Linux using Make
|
||||
|
||||
Builds for macOS and Linux are tested with CMake 3.17, and clang++ 9.0 or
|
||||
newer.
|
||||
|
||||
> Compiling using g++ is supported, but clang++ builds are faster by ~15%.
|
||||
|
||||
### Configuring the build
|
||||
|
||||
To use CMake you must first configure the build. Create a build directory
|
||||
in the root of the astcenc checkout, and then run `cmake` inside that directory
|
||||
to generate the build system.
|
||||
|
||||
```shell
|
||||
# Select your compiler (clang++ recommended, but g++ works)
|
||||
export CXX=clang++
|
||||
|
||||
# Create a build directory
|
||||
mkdir build
|
||||
cd build
|
||||
|
||||
# Configure your build of choice, for example:
|
||||
|
||||
# Arm arch64
|
||||
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ \
|
||||
-DASTCENC_ISA_NEON=ON ..
|
||||
|
||||
# x86-64
|
||||
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ \
|
||||
-DASTCENC_ISA_AVX2=ON -DASTCENC_ISA_SSE41=ON -DASTCENC_ISA_SSE2=ON ..
|
||||
|
||||
# macOS universal binary build
|
||||
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ ..
|
||||
```
|
||||
|
||||
A single CMake configure can build multiple binaries for a single target CPU
|
||||
architecture, for example building x64 for both SSE2 and AVX2. Each binary name
|
||||
will include the build variant as a postfix. It is possible to build any set of
|
||||
the supported SIMD variants by enabling only the ones you require.
|
||||
|
||||
For macOS, we additionally support the ability to build a universal binary.
|
||||
This build includes SSE4.1 (`x86_64`), AVX2 (`x86_64h`), and NEON (`arm64`)
|
||||
build slices in a single output binary. The OS will select the correct variant
|
||||
to run for the machine being used. This is the default build target for a macOS
|
||||
build, but single-target binaries can still be built by setting
|
||||
`-DASTCENC_UNIVERSAL_BINARY=OFF` and then manually selecting the specific ISA
|
||||
variants that are required.
|
||||
|
||||
### Building
|
||||
|
||||
Once you have configured the build you can use Make to compile the project from
|
||||
your build dir, and install to your target install directory.
|
||||
|
||||
```shell
|
||||
# Run a build and install build outputs in `${CMAKE_INSTALL_PREFIX}/bin/`
|
||||
# for executable binaries and `${CMAKE_INSTALL_PREFIX}/lib/` for libraries
|
||||
cd build
|
||||
make install -j16
|
||||
```
|
||||
|
||||
## macOS using XCode
|
||||
|
||||
Builds for macOS and Linux are tested with CMake 3.17, and XCode 14.0 or
|
||||
newer.
|
||||
|
||||
### Configuring the build
|
||||
|
||||
To use CMake you must first configure the build. Create a build directory
|
||||
in the root of the astcenc checkout, and then run `cmake` inside that directory
|
||||
to generate the build system.
|
||||
|
||||
```shell
|
||||
# Create a build directory
|
||||
mkdir build
|
||||
cd build
|
||||
|
||||
# Configure a universal build
|
||||
cmake -G Xcode -DCMAKE_INSTALL_PREFIX=../ ..
|
||||
```
|
||||
|
||||
### Building
|
||||
|
||||
Once you have configured the build you can use CMake to compile the project
|
||||
from your build dir, and install to your target install directory.
|
||||
|
||||
```shell
|
||||
cmake --build . --config Release
|
||||
|
||||
# Optionally install the binaries to the installation directory
|
||||
cmake --install . --config Release
|
||||
```
|
||||
|
||||
## Advanced build options
|
||||
|
||||
For codec developers and power users there are a number of useful features in
|
||||
the build system.
|
||||
|
||||
### Build Types
|
||||
|
||||
We support and test the following `CMAKE_BUILD_TYPE` options.
|
||||
|
||||
| Value | Description |
|
||||
| ---------------- | -------------------------------------------------------- |
|
||||
| Release | Optimized release build |
|
||||
| RelWithDebInfo | Optimized release build with debug info |
|
||||
| Debug | Unoptimized debug build with debug info |
|
||||
|
||||
Note that optimized release builds are compiled with link-time optimization,
|
||||
which can make profiling more challenging ...
|
||||
|
||||
### Shared Libraries
|
||||
|
||||
We support building the core library as a shared object by setting the CMake
|
||||
option `-DASTCENC_SHAREDLIB=ON` at configure time. For macOS build targets the
|
||||
shared library supports the same universal build configuration as the command
|
||||
line utility.
|
||||
|
||||
Note that the command line tool is always statically linked; the shared objects
|
||||
are an extra build output that are not currently used by the command line tool.
|
||||
|
||||
### Constrained block size builds
|
||||
|
||||
All normal builds will support all ASTC block sizes, including the worst case
|
||||
6x6x6 3D block size (216 texels per block). Compressor memory footprint and
|
||||
performance can be improved by limiting the block sizes supported in the build
|
||||
by adding `-DASTCENC_BLOCK_MAX_TEXELS=<texel_count>` to to CMake command line
|
||||
when configuring. Legal block sizes that are unavailable in a restricted build
|
||||
will return the error `ASTCENC_ERR_NOT_IMPLEMENTED` during context creation.
|
||||
|
||||
### Non-invariant builds
|
||||
|
||||
All normal builds are designed to be invariant, so any build from the same git
|
||||
revision will produce bit-identical results for all compilers and CPU
|
||||
architectures. To achieve this we sacrifice some performance, so if this is
|
||||
not required you can specify `-DASTCENC_INVARIANCE=OFF` to enable additional
|
||||
optimizations. This has most benefit for AVX2 builds where we are able to
|
||||
enable use of the FMA instruction set extensions.
|
||||
|
||||
### No intrinsics builds
|
||||
|
||||
All normal builds will use SIMD accelerated code paths using intrinsics, as all
|
||||
supported target architectures (x86 and arm64) guarantee SIMD availability. For
|
||||
development purposes it is possible to build an intrinsic-free build which uses
|
||||
no explicit SIMD acceleration (the compiler may still auto-vectorize).
|
||||
|
||||
To enable this binary variant add `-DASTCENC_ISA_NONE=ON` to the CMake command
|
||||
line when configuring. It is NOT recommended to use this for production; it is
|
||||
significantly slower than the vectorized SIMD builds.
|
||||
|
||||
### No x86 gather instruction builds
|
||||
|
||||
On many x86 microarchitectures the native AVX gather instructions are slower
|
||||
than simply performing manual scalar loads and combining the results. Gathers
|
||||
are enabled by default, but can be disabled by setting the CMake option
|
||||
`-DASTCENC_X86_GATHERS=OFF` on the command line when configuring.
|
||||
|
||||
Note that we have seen mixed results when compiling the scalar fallback path,
|
||||
so we would recommend testing which option works best for the compiler and
|
||||
microarchitecture pairing that you are targeting.
|
||||
|
||||
### Test builds
|
||||
|
||||
We support building unit tests. These use the `googletest` framework, which is
|
||||
pulled in though a git submodule. On first use, you must fetch the submodule
|
||||
dependency:
|
||||
|
||||
```shell
|
||||
git submodule init
|
||||
git submodule update
|
||||
```
|
||||
|
||||
To build unit tests add `-DASTCENC_UNITTEST=ON` to the CMake command line when
|
||||
configuring.
|
||||
|
||||
To run unit tests use the CMake `ctest` utility from your build directory after
|
||||
you have built the tests.
|
||||
|
||||
```shell
|
||||
cd build
|
||||
ctest --verbose
|
||||
```
|
||||
|
||||
### Sanitizer builds
|
||||
|
||||
We support building with sanitizers on Linux and macOS when using Clang.
|
||||
|
||||
To build binaries with ASAN checking enabled add `-DASTCENC_ASAN=ON` to the
|
||||
CMake command line when configuring.
|
||||
|
||||
To build binaries with UBSAN checking enabled add `-DASTCENC_UBSAN=ON` to the
|
||||
CMake command line when configuring.
|
||||
|
||||
### Android builds
|
||||
|
||||
Builds of the command line utility for Android are not officially supported, but can be a useful
|
||||
development build for testing on e.g. different Arm CPU microarchitectures.
|
||||
|
||||
The build script below shows one possible route to building the command line tool for Android. Once
|
||||
built the application can be pushed to e.g. `/data/local/tmp` and executed from an Android shell
|
||||
terminal over `adb`.
|
||||
|
||||
```shell
|
||||
ANDROID_ABI=arm64-v8a
|
||||
ANDROID_NDK=/work/tools/android/ndk/22.1.7171670
|
||||
|
||||
BUILD_TYPE=RelWithDebInfo
|
||||
|
||||
BUILD_DIR=build
|
||||
|
||||
mkdir -p ${BUILD_DIR}
|
||||
cd ${BUILD_DIR}
|
||||
|
||||
cmake \
|
||||
-DCMAKE_INSTALL_PREFIX=./ \
|
||||
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
|
||||
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
|
||||
-DANDROID_ABI=${ANDROID_ABI} \
|
||||
-DANDROID_ARM_NEON=ON \
|
||||
-DANDROID_PLATFORM=android-21 \
|
||||
-DCMAKE_ANDROID_NDK_TOOLCHAIN_VERSION=clang \
|
||||
-DANDROID_TOOLCHAIN=clang \
|
||||
-DANDROID_STL=c++_static \
|
||||
-DARCH=aarch64 \
|
||||
-DASTCENC_ISA_NEON=ON \
|
||||
..
|
||||
|
||||
make -j16
|
||||
```
|
||||
|
||||
## Packaging a release bundle
|
||||
|
||||
We support building a release bundle of all enabled binary configurations in
|
||||
the current CMake configuration using the `package` build target
|
||||
|
||||
Configure CMake with:
|
||||
|
||||
* `-DASTCENC_PACAKGE=<arch>` to set the package architecture/variant name used
|
||||
to name the package archive (not set by default).
|
||||
|
||||
```shell
|
||||
# Run a build and package build outputs in `./astcenc-<ver>-<os>-<arch>.<fmt>`
|
||||
cd build
|
||||
make package -j16
|
||||
```
|
||||
|
||||
Windows packages will use the `.zip` format, other packages will use the
|
||||
`.tar.gz` format.
|
||||
|
||||
## Integrating as a library into another project
|
||||
|
||||
The core codec of `astcenc` is built as a library, and so can be easily
|
||||
integrated into other projects using CMake. An example of the CMake integration
|
||||
and the codec API usage can be found in the `./Utils/Example` directory in the
|
||||
repository. See the [Example Readme](../Utils/Example/README.md) for more
|
||||
details.
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2019-2024, Arm Limited and contributors. All rights reserved._
|
||||
@@ -0,0 +1,328 @@
|
||||
# 2.x series change log
|
||||
|
||||
This page summarizes the major functional and performance changes in each
|
||||
release of the 2.x series.
|
||||
|
||||
All performance data on this page is measured on an Intel Core i5-9600K
|
||||
clocked at 4.2 GHz, running astcenc using 6 threads.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 2.5
|
||||
|
||||
**Status:** Released, March 2021
|
||||
|
||||
The 2.5 release is the last major release in the 2.x series. After this release
|
||||
a `2.x` branch will provide stable long-term support, and the `main` branch
|
||||
will switch to focusing on more radical changes for the 3.x series.
|
||||
|
||||
Reminder for users of the library interface - the API is not designed to be
|
||||
stable across versions, and this release is not compatible with earlier 2.x
|
||||
releases. Please update and rebuild your client-side code using the updated
|
||||
`astcenc.h` header.
|
||||
|
||||
**General:**
|
||||
* **Feature:** The `ISA_INVARIANCE` build option is no longer supported, as
|
||||
there is no longer any performance benefit from the variant paths. All
|
||||
builds are now using the equivalent of the `ISA_INVARIANCE=ON` setting, and
|
||||
all builds (except Armv7) are now believed to be invariant across operating
|
||||
systems, compilers, CPU architectures, and SIMD instruction sets.
|
||||
* **Feature:** Armv8 32-bit builds with NEON are now supported, with
|
||||
out-of-the-box support for Arm Linux soft-float and hard-float ABIs. There
|
||||
are no pre-built binaries for these targets; support is included for
|
||||
library users targeting older 32-bit Android and iOS devices.
|
||||
* **Feature:** A compressor mode for encoding HDR textures that have been
|
||||
encoded into LDR RGBM wrapper format is now supported. Note that this
|
||||
encoding has some strong recommendations for how the RGBM encoding is
|
||||
implemented to avoid block artifacts in the compressed image.
|
||||
* **Core API:**
|
||||
* **API Change:** The core API has been changed to be a pure C API, making it
|
||||
easier to wrap the codec in a stable shared library ABI. Some entry points
|
||||
that used to accept references now expect pointers.
|
||||
* **API Change:** The decompression functionality in the core API has been
|
||||
changed to allow use of multiple threads. The design pattern matches the
|
||||
compression functionality, requiring the caller to create the threads,
|
||||
synchronize them between images, and to call the new
|
||||
`astcenc_decompress_reset()` function between images.
|
||||
* **API Feature:** Defines to support exporting public API entry point
|
||||
symbols from a shared object are provided, but not exposed off-the-shelf by
|
||||
the CMake provided by the project.
|
||||
* **API Feature:** New `astcenc_get_block_info()` function added to the core
|
||||
API to allow users to perform high level analysis of compressed data. This
|
||||
API is not implemented in decompressor-only builds.
|
||||
* **API Feature:** Codec configuration structure has been extended to expose
|
||||
the new RGBM compression mode. See the API header for details.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 2.4
|
||||
|
||||
**Status:** Released, February 2021
|
||||
|
||||
The 2.4 release is the fifth release in the 2.x series. It is primarily a bug
|
||||
fix release for HDR image handling, which impacts all earlier 2.x series
|
||||
releases.
|
||||
|
||||
**General:**
|
||||
* **Feature:** When using the `-a` option, or the equivalent config option
|
||||
for the API, any 2D blocks that are entirely zero alpha after the alpha
|
||||
filter radius is taken into account are replaced by transparent black
|
||||
constant color blocks. This is an RDO-like technique to improve compression
|
||||
ratios of any additional application packaging compression that is applied.
|
||||
**Command Line:**
|
||||
* **Bug fix:** The command line wrapper now correctly loads HDR images that
|
||||
have a non-square aspect ratio.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 2.3
|
||||
|
||||
**Status:** Released, January 2021
|
||||
|
||||
The 2.3 release is the fourth release in the 2.x series. It includes a number
|
||||
of performance improvements and new features.
|
||||
|
||||
Reminder for users of the library interface - the API is not designed to be
|
||||
stable across versions, and this release is not compatible with 2.2. Please
|
||||
recompile your client-side code using the updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** Decompressor-only builds of the codec are supported again.
|
||||
While this is primarily a feature for library users who want to shrink
|
||||
binary size, a variant command line tool `astcdec` can be built by
|
||||
specifying `DECOMPRESSOR=ON` on the CMake configure command line.
|
||||
* **Feature:** Diagnostic builds of the codec can now be built. These builds
|
||||
generate a JSON file containing a trace of the compressor execution.
|
||||
Diagnostic builds are only suitable for codec development; they are slower
|
||||
and JSON generation cannot be disabled. Build by setting `DIAGNOSTICS=ON`
|
||||
on the CMake configure command line.
|
||||
* **Feature:** Code compatibility improved with older versions of GCC,
|
||||
earliest compiler now tested is GCC 7.5 (was GCC 9.3).
|
||||
* **Feature:** Code compatibility improved with newer versions of LLVM,
|
||||
latest compiler now tested is Clang 12.0 (was Clang 9.0).
|
||||
* **Feature:** Code compatibility improved with the Visual Studio 2019 LLVM
|
||||
toolset (`clang-cl`). Using the LLVM toolset gives 25% performance
|
||||
improvements and is recommended.
|
||||
* **Command Line:**
|
||||
* **Feature:** Quality level now accepts either a preset (`-fast`, etc) or a
|
||||
float value between 0 and 100, allowing more control over the compression
|
||||
quality vs performance trade-off. The presets are not evenly spaced in the
|
||||
float range; they have been spaced to give the best distribution of points
|
||||
between the fast and thorough presets.
|
||||
* `-fastest`: 0.0
|
||||
* `-fast`: 10.0
|
||||
* `-medium`: 60.0
|
||||
* `-thorough`: 98.0
|
||||
* `-exhaustive`: 100.0
|
||||
* **Core API:**
|
||||
* **API Change:** Quality level preset enum replaced with a float value
|
||||
between 0 (`-fastest`) and 100 (`-exhaustive`). See above for more info.
|
||||
|
||||
### Performance
|
||||
|
||||
This release includes a number of optimizations to improve performance.
|
||||
|
||||
* New compressor algorithm for handling encoding candidates and refinement.
|
||||
* Vectorized implementation of `compute_error_of_weight_set()`.
|
||||
* Unrolled implementation of `encode_ise()`.
|
||||
* Many other small improvements!
|
||||
|
||||
The most significant change is the change to the compressor path, which now
|
||||
uses an adaptive approach to candidate trials and block refinement.
|
||||
|
||||
In earlier releases the quality level will determine the number of encoding
|
||||
candidates and the number of iterative refinement passes that are used for each
|
||||
major encoding trial. This is a fixed behavior; it will always try the full N
|
||||
candidates and M refinement iterations specified by the quality level for each
|
||||
encoding trial.
|
||||
|
||||
The new approach implements two optimizations for this:
|
||||
|
||||
* Compression will complete when a block candidate hits the specified target
|
||||
quality, after its M refinement iterations have been applied. Later block
|
||||
candidates are simply abandoned.
|
||||
* Block candidates will predict how much refinement can improve them, and
|
||||
abandon refinement if they are unlikely to improve upon the best known
|
||||
encoding already in-hand.
|
||||
|
||||
This pair of optimizations provides significant performance improvement to the
|
||||
high quality modes which use the most block candidates and refinement
|
||||
iterations. A minor loss of image quality is expected, as the blocks we no
|
||||
longer test or refine may have been better coding choices.
|
||||
|
||||
**Absolute performance vs 2.2 release:**
|
||||
|
||||

|
||||
|
||||
**Relative performance vs 2.2 release:**
|
||||
|
||||

|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 2.2
|
||||
|
||||
**Status:** Released, January 2021
|
||||
|
||||
The 2.2 release is the third release in the 2.x series. It includes a number
|
||||
of performance improvements and new features.
|
||||
|
||||
Reminder for users of the library interface - the API is not designed to be
|
||||
stable across versions, and this release is not compatible with 2.1. Please
|
||||
recompile your client-side code using the updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** New Arm aarch64 NEON accelerated vector library support.
|
||||
* **Improvement:** New CMake build system for all platforms.
|
||||
* **Improvement:** SSE4.2 feature profile changed to SSE4.1, which more
|
||||
accurately reflects the feature set used.
|
||||
* **Binary releases:**
|
||||
* **Improvement:** Linux binaries changed to use Clang 9.0, which gives
|
||||
up to 15% performance improvement.
|
||||
* **Improvement:** Windows binaries are now code signed.
|
||||
* **Improvement:** macOS binaries for Apple silicon platforms now provided.
|
||||
* **Improvement:** macOS binaries are now code signed and notarized.
|
||||
* **Command Line:**
|
||||
* **Feature:** New image preprocess `-pp-normalize` option added. This forces
|
||||
normal vectors to be unit length, which is useful when compressing source
|
||||
textures that use normal length to encode an NDF, which is incompatible
|
||||
with ASTC's two channel encoding.
|
||||
* **Feature:** New image preprocess `-pp-premultiply` option added. This
|
||||
scales RGB values by the alpha value. This can be useful to minimize
|
||||
cross-channel color bleed caused by GPU post-multiply filtering/blending.
|
||||
* **Improvements:** Command line tool cleanly traps and reports errors for
|
||||
corrupt input images rather than relying on standard library `assert()`
|
||||
calls in release builds.
|
||||
* **Core API:**
|
||||
* **API Change:** Images using region-based metrics no longer need to include
|
||||
padding; all input images should be tightly packed and `dim_pad` is removed
|
||||
from the `astcenc_image` structure. This makes it easier to directly use
|
||||
images loaded from other libraries.
|
||||
* **API Change:** Image `data` is no longer a 3D array accessed using
|
||||
`data[z][y][x]` indexing, it's an array of 2D slices. This makes it easier
|
||||
to directly use images loaded from other libraries.
|
||||
* **API Change:** New `ASTCENC_FLG_SELF_DECOMPRESS_ONLY` flag added to the
|
||||
codec config. Using this flag enables additional optimizations that
|
||||
aggressively exploit implementation- and configuration-specific, behavior
|
||||
to gain performance. When using this flag the codec can only reliably
|
||||
decompress images that were compressed in the same context session. Images
|
||||
produced via other means may fail to decompress correctly, even if they are
|
||||
otherwise valid ASTC files.
|
||||
|
||||
### Performance
|
||||
|
||||
There is one major set of optimizations in this release, related to the new
|
||||
`ASTCENC_FLG_SELF_DECOMPRESS_ONLY` mode. These allow the compressor to only
|
||||
create data tables it knows that it is going to use, based on its current set
|
||||
of heuristics, rather than needing the full set the format allows.
|
||||
|
||||
The first benefit of these changes is a reduced context creation time, which
|
||||
can be reduced by up to 250ms on our test machine. This is a significant
|
||||
percentage of the command line utility runtime for a small image when using a
|
||||
quick search preset. Compressing the whole Kodak test suite using the command
|
||||
line utility and the `-fastest` preset is ~30% faster with this release, which
|
||||
is mostly due to faster startup.
|
||||
|
||||
The reduction in the data table size in this mode also improve the core codec
|
||||
speed. Our test sets show an average of 12% improvement in the codec for
|
||||
`-fastest` mode, and an average of 3% for `-medium` mode.
|
||||
|
||||
Key for performance charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Absolute performance vs 2.1 release:**
|
||||
|
||||

|
||||
|
||||
**Relative performance vs 2.1 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 2.1
|
||||
|
||||
**Status:** Released, November 2020
|
||||
|
||||
The 2.1 release is the second release in the 2.x series. It includes a number
|
||||
of performance optimizations and new features.
|
||||
|
||||
Reminder for users of the library interface - the API is not designed to be
|
||||
stable across versions, and this release is not compatible with 2.0. Please
|
||||
recompile your client-side code using the updated `astcenc.h` header.
|
||||
|
||||
### Features:
|
||||
|
||||
* **Command line:**
|
||||
* **Bug fix:** The meaning of the `-tH\cH\dH` and `-th\ch\dh` compression
|
||||
modes was inverted. They now match the documentation; use `-*H` for HDR
|
||||
RGBA, and `-*h` for HDR RGB with LDR alpha.
|
||||
* **Feature:** A new `-fastest` quality preset is now available. This is
|
||||
designed for fast "roughing out" of new content, and sacrifices significant
|
||||
image quality compared to `-fast`. We do not recommend its use for
|
||||
production builds.
|
||||
* **Feature:** A new `-candidatelimit` compression tuning option is now
|
||||
available. This is a power-user control to determine how many candidates
|
||||
are returned for each block mode encoding trial. This feature is used
|
||||
automatically by the search presets; see `-help` for details.
|
||||
* **Improvement:** The compression test modes (`-tl\ts\th\tH`) now emit a
|
||||
MTex/s performance metric, in addition to coding time.
|
||||
* **Core API:**
|
||||
* **Feature:** A new quality preset `ASTCENC_PRE_FASTEST` is available. See
|
||||
`-fastest` above for details.
|
||||
* **Feature:** A new tuning option `tune_candidate_limit` is available in
|
||||
the config structure. See `-candidatelimit` above for details.
|
||||
* **Feature:** Image input/output can now use `ASTCENC_TYPE_F32` data types.
|
||||
* **Stability:**
|
||||
* **Feature:** The SSE2, SSE4.2, and AVX2 variants now produce identical
|
||||
compressed output when run on the same CPU when compiled with the
|
||||
preprocessor define `ASTCENC_ISA_INVARIANCE=1`. For Make builds this can
|
||||
be set on the command line by setting `ISA_INV=1`. ISA invariance is off
|
||||
by default; it reduces performance by 1-3%.
|
||||
|
||||
### Performance
|
||||
|
||||
Key for performance charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Absolute performance vs 2.0 release:**
|
||||
|
||||

|
||||
|
||||
**Relative performance vs 2.0 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 2.0
|
||||
|
||||
**Status:** Released, August 2020
|
||||
|
||||
The 2.0 release is first release in the 2.x series. It includes a number of
|
||||
major changes over the earlier 1.7 series, and is not command-line compatible.
|
||||
|
||||
### Features:
|
||||
|
||||
* The core codec can be built as a library, exposed via a new codec API.
|
||||
* The core codec supports accelerated SIMD paths for SSE2, SSE4.2, and AVX2.
|
||||
* The command line syntax has a clearer mapping to Khronos feature profiles.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for performance charts
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Absolute performance vs 1.7 release:**
|
||||
|
||||

|
||||
|
||||
**Relative performance vs 1.7 release:**
|
||||
|
||||

|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2020-2022, Arm Limited and contributors. All rights reserved._
|
||||
@@ -0,0 +1,308 @@
|
||||
# 3.x series change log
|
||||
|
||||
This page summarizes the major functional and performance changes in each
|
||||
release of the 3.x series.
|
||||
|
||||
All performance data on this page is measured on an Intel Core i5-9600K
|
||||
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.7
|
||||
|
||||
**Status:** April 2022
|
||||
|
||||
The 3.7 release contains another round of performance optimizations, including
|
||||
significant improvements to the command line front-end (faster PNG loader) and
|
||||
the arm64 build of the codec (faster NEON implementation).
|
||||
|
||||
* **General:**
|
||||
* **Feature:** The command line tool PNG loader has been switched to use
|
||||
the Wuffs library, which is robust and significantly faster than the
|
||||
current stb_image implementation.
|
||||
* **Feature:** Support for non-invariant builds returns. Opt-in to slightly
|
||||
faster, but not bit-exact, builds by setting `-DNO_INVARIANCE=ON` for the
|
||||
CMake configuration. This improves performance by around 2%.
|
||||
* **Optimization:** Changed SIMD `select()` so that it matches the default
|
||||
NEON behavior (bitwise select), rather than the default x86-64 behavior
|
||||
(lane select on MSB). Specialization `select_msb()` added for the one case
|
||||
we want to select on a sign-bit, where NEON needs a different
|
||||
implementation. This provides a significant (>25%) performance uplift on
|
||||
NEON implementations.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 3.5 release:**
|
||||
|
||||

|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.6
|
||||
|
||||
**Status:** April 2022
|
||||
|
||||
The 3.6 release contains another round of performance optimizations.
|
||||
|
||||
There are no interface changes in this release, but in general the API is not
|
||||
designed to be binary compatible across versions. We always recommend
|
||||
rebuilding your client-side code using the updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** Data tables are now optimized for contexts without the
|
||||
`SELF_DECOMPRESS_ONLY` flag set. The flag therefore no longer improves
|
||||
compression performance, but still reduces context creation time and
|
||||
context data table memory footprint.
|
||||
* **Feature:** Image quality for 4x4 `-fastest` configuration has been
|
||||
improved.
|
||||
* **Optimization:** Decimation modes are reliably excluded from processing
|
||||
when they are only partially selected in the compressor configuration (e.g.
|
||||
if used for single plane, but not dual plane modes). This is a significant
|
||||
performance optimization for all quality levels.
|
||||
* **Optimization:** Fast-path block load function variant added for 2D LDR
|
||||
images with no swizzle. This is a moderate performance optimization for the
|
||||
fast and fastest quality levels.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 3.5 release:**
|
||||
|
||||

|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.5
|
||||
|
||||
**Status:** March 2022
|
||||
|
||||
The 3.5 release contains another round of performance optimizations.
|
||||
|
||||
There are no interface changes in this release, but in general the API is not
|
||||
designed to be binary compatible across versions. We always recommend
|
||||
rebuilding your client-side code using the updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** Compressor configurations using `SELF_DECOMPRESS_ONLY` mode
|
||||
store compacted partition tables, which significantly improves both
|
||||
context create time and runtime performance.
|
||||
* **Feature:** Bilinear infill for decimated weight grids supports a new
|
||||
variant for half-decimated grids which are only decimated in one axis.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 3.4 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.4
|
||||
|
||||
**Status:** February 2022
|
||||
|
||||
The 3.4 release introduces another round of optimizations, removing a number
|
||||
of power-user configuration options to simplify the core compressor data path.
|
||||
|
||||
Reminder for users of the library interface - the API is not designed to be
|
||||
binary compatible across versions, and this release is not compatible with
|
||||
earlier releases. Please update and rebuild your client-side code using the
|
||||
updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** Many memory allocations have been moved off the stack into
|
||||
dynamically allocated working memory. This significantly reduces the peak
|
||||
stack usage, allowing the compressor to run in systems with 128KB stack
|
||||
limits.
|
||||
* **Feature:** Builds now support `-DBLOCK_MAX_TEXELS=<count>` to allow a
|
||||
compressor to support a subset of block sizes. This can reduce binary size
|
||||
and runtime memory footprint, and improve performance.
|
||||
* **Feature:** The `-v` and `-va` options to set a per-texel error weight
|
||||
function are no longer supported.
|
||||
* **Feature:** The `-b` option to set a per-texel error weight boost for
|
||||
block border texels is no longer supported.
|
||||
* **Feature:** The `-a` option to set a per-texel error weight based on texel
|
||||
alpha value is no longer supported as an error weighting tool, but is still
|
||||
supported for providing sprite-sheet RDO.
|
||||
* **Feature:** The `-mask` option to set an error metric for mask map
|
||||
textures is still supported, but is currently a no-op in the compressor.
|
||||
* **Feature:** The `-perceptual` option to set a perceptual error metric is
|
||||
still supported, but is currently a no-op in the compressor for mask map
|
||||
and normal map textures.
|
||||
* **Bug-fix:** Corrected decompression of error blocks in some cases, so now
|
||||
returning the expected error color (magenta for LDR, NaN for HDR). Note
|
||||
that astcenc determines the error color to use based on the output image
|
||||
data type not the decoder profile.
|
||||
* **Binary releases:**
|
||||
* **Improvement:** Windows binaries changed to use ClangCL 12.0, which gives
|
||||
up to 10% performance improvement.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 3.3 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.3
|
||||
|
||||
**Status:** November 2021
|
||||
|
||||
The 3.3 release improves image quality for normal maps, and two component
|
||||
textures. Normal maps are expected to compress 25% slower than the 3.2
|
||||
release, although it should be noted that they are still faster to compress
|
||||
in 3.3 than when using the 2.5 series. This release also fixes one reported
|
||||
stability issue.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** Normal map image quality has been improved.
|
||||
* **Feature:** Two component image quality has been improved, provided
|
||||
that unused components are correctly zero-weighted using e.g. `-cw` on the
|
||||
command line.
|
||||
* **Bug-fix:** Improved stability when trying to compress complex blocks that
|
||||
could not beat even the starting quality threshold. These will now always
|
||||
compress in to a constant color blocks.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.2
|
||||
|
||||
**Status:** August 2021
|
||||
|
||||
The 3.2 release is a bugfix release; no significant image quality or
|
||||
performance differences are expected.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Improved stability when new contexts were created while other
|
||||
contexts were compressing or decompressing an image.
|
||||
* **Bug-fix:** Improved stability when decompressing blocks with invalid
|
||||
block encodings.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.1
|
||||
|
||||
**Status:** July 2021
|
||||
|
||||
The 3.1 release gives another performance boost, typically between 5 and 20%
|
||||
faster than the 3.0 release, as well as further incremental improvements to
|
||||
image quality. A number of build system improvements make astcenc easier and
|
||||
faster to integrate into other projects as a library, including support for
|
||||
building universal binaries on macOS. Full change list is shown below.
|
||||
|
||||
Reminder for users of the library interface - the API is not designed to be
|
||||
binary compatible across versions, and this release is not compatible with
|
||||
earlier releases. Please update and rebuild your client-side code using the
|
||||
updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** RGB color data now supports `-perceptual` operation. The
|
||||
current implementation is simple, weighting color channel errors by their
|
||||
contribution to perceived luminance. This mimics the behavior of the human
|
||||
visual system, which is most sensitive to green, then red, then blue.
|
||||
* **Feature:** Codec supports a new low weight search mode, which is a
|
||||
simpler weight assignment for encodings with a low number of weights in the
|
||||
weight grid. The weight threshold can be overridden using the new
|
||||
`-lowweightmodelimit` command line option.
|
||||
* **Feature:** All platform builds now support building a native binary.
|
||||
Native binaries automatically select the SIMD level based on the default
|
||||
configuration of the compiler in use. Native binaries built on one machine
|
||||
may use different SIMD options than native binaries build on another.
|
||||
* **Feature:** macOS platform builds now support building universal binaries
|
||||
containing both `x86_64` and `arm64` target support.
|
||||
* **Feature:** Building the command line can be disabled when using as a
|
||||
library in another project. Set `-DCLI=OFF` during the CMake configure
|
||||
step.
|
||||
* **Feature:** A standalone minimal example of the core codec API usage has
|
||||
been added in the `./Utils/Example/` directory.
|
||||
* **Core API:**
|
||||
* **Feature:** Config flag `ASTCENC_FLG_USE_PERCEPTUAL` works for color data.
|
||||
* **Feature:** Config option `tune_low_weight_count_limit` added.
|
||||
* **Feature:** New heuristic added which prunes dual weight plane searches if
|
||||
they are unlikely to help. This heuristic is not user controllable.
|
||||
* **Feature:** Image quality has been improved. In general we see significant
|
||||
improvements (up to 0.2dB) for high bitrate encodings (4x4, 5x4), and a
|
||||
smaller improvement (up to 0.1dB) for lower bitrate encodings.
|
||||
* **Bug fix:** Arm "none" SIMD builds could be invariant with other builds.
|
||||
This fix has also been back-ported to the 2.x LTS branch.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 3.0 release:**
|
||||
|
||||

|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 3.0
|
||||
|
||||
**Status:** June 2021
|
||||
|
||||
The 3.0 release is the first in a series of updates to the compressor that are
|
||||
making more radical changes than we felt we could make with the 2.x series.
|
||||
The primary goals of the 3.x series are to keep the image quality ~static or
|
||||
better compared to the 2.5 release, but continue to improve performance.
|
||||
|
||||
Reminder for users of the library interface - the API is not designed to be
|
||||
binary compatible across versions, and this release is not compatible with
|
||||
earlier releases. Please update and rebuild your client-side code using the
|
||||
updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** The code has been significantly cleaned up, with improved
|
||||
comments, API documentation, function naming, and variable naming.
|
||||
* **Core API:**
|
||||
* **API Change:** The core APIs for `astcenc_compress_image()` and for
|
||||
`astcenc_decompress_image()` now accept swizzle structures by `const`
|
||||
pointer, instead of pass-by-value.
|
||||
* **API Change:** Calling the `astcenc_compress_reset()` and the
|
||||
`astcenc_decompress_reset()` functions between images is no longer required
|
||||
if the context was created for use by a single thread.
|
||||
* **Feature:** New heuristics have been added for controlling when to search
|
||||
beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and
|
||||
1 plane. The previous `tune_partition_early_out_limit` config option has
|
||||
been removed, and replaced with two new options
|
||||
`tune_2_partition_early_out_limit_factor` and
|
||||
`tune_3_partition_early_out_limit_factor`. See command line help for more
|
||||
detailed documentation.
|
||||
* **Feature:** New heuristics have been added for controlling when to use
|
||||
dual weight planes. The previous `tune_two_plane_early_out_limit` has been
|
||||
renamed to`tune_2_plane_early_out_limit_correlation`. See command line help
|
||||
for more detailed documentation.
|
||||
* **Feature:** Support for using dual weight planes has been restricted to
|
||||
single partition blocks; it rarely helps blocks with 2 or more partitions
|
||||
and takes considerable compression search time.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 2.5 release:**
|
||||
|
||||

|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2021-2022, Arm Limited and contributors. All rights reserved._
|
||||
@@ -0,0 +1,416 @@
|
||||
# 4.x series change log
|
||||
|
||||
This page summarizes the major functional and performance changes in each
|
||||
release of the 4.x series.
|
||||
|
||||
All performance data on this page is measured on an Intel Core i5-9600K
|
||||
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.8.0
|
||||
|
||||
**Status:** May 2024
|
||||
|
||||
The 4.8.0 release is a minor maintenance release.
|
||||
|
||||
* **General:**
|
||||
* **Bug fix:** Native builds on macOS will now correctly build for arm64 when
|
||||
run outside of Rosetta on an Apple silicon device.
|
||||
* **Bug fix:** Multiple small improvements to remove use of undefined
|
||||
language behavior, to improve support for deployment using Emscripten.
|
||||
* **Feature:** Builds using Clang can now build with undefined behavior
|
||||
sanitizer by setting `-DASTCENC_UBSAN=ON` on the CMake configure line.
|
||||
* **Feature:** Updated to Wuffs library 0.3.4, which ignores tRNS alpha
|
||||
chunks for type 4 (LA) and 6 (RGBA) PNGs, to improve compatibility with
|
||||
libpng.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.7.0
|
||||
|
||||
**Status:** January 2024
|
||||
|
||||
The 4.7.0 release is a major maintenance release, fixing rounding behavior in
|
||||
the decompressor to match the Khronos specification. This fix includes the
|
||||
addition of explicit support for optimizing for `decode_unorm8` rounding.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the
|
||||
updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug fix:** sRGB LDR decompression now uses the correct endpoint expansion
|
||||
method to create the 16-bit RGB endpoint colors, and removes the previous
|
||||
correction code from the interpolation function. This bug could result in
|
||||
LSB bit flips relative to the standard specification.
|
||||
* **Bug fix:** Decompressing to an 8-bit per component output image now
|
||||
matches the `decode_unorm8` extension rounding rules. This bug could result
|
||||
in LSB bit flips relative to the standard specification.
|
||||
* **Bug fix:** Code now avoids using `alignas()` in the reference C
|
||||
implementation, as the default `alignas(16)` is narrower than the
|
||||
native minimum alignment requirement on some CPUs.
|
||||
* **Feature:** Library configuration supports a new flag,
|
||||
`ASTCENC_FLG_USE_DECODE_UNORM8`. This flag indicates that the image will be
|
||||
used with the `decode_unorm8` decode mode. When set during compression
|
||||
this allows the compressor to use the correct rounding when determining the
|
||||
best encoding.
|
||||
* **Feature:** Command line tool supports a new option, `-decode_unorm8`.
|
||||
This option indicates that the image will be used with the `decode_unorm8`
|
||||
decode mode. This option will automatically be set for decompression
|
||||
(`-d*`) and trial (`-t*`) tool operation if the decompressed output image
|
||||
is stored to an 8-bit per component file format. This option must be set
|
||||
manually for compression (`-c*`) tool operation, as the desired decode mode
|
||||
cannot be reliably determined.
|
||||
* **Feature:** Library configuration supports a new optional progress
|
||||
reporting callback to be specified. This is called during compression to
|
||||
to allow interactive tooling use cases to display incremental progress. The
|
||||
command line tool uses this feature to show compression progress unless
|
||||
`-silent` is used.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.6.1
|
||||
|
||||
**Status:** November 2023
|
||||
|
||||
The 4.6.1 release is a minor maintenance release to fix a scaling bug on
|
||||
large core count Windows systems.
|
||||
|
||||
* **General:**
|
||||
* **Optimization:** Windows builds of the `astcenc` command line tool can now
|
||||
use more than 64 cores on large core count systems. This change doubled
|
||||
command line performance for `-exhaustive` compression when testing on an
|
||||
96 core/192 thread system.
|
||||
* **Feature:** Windows Arm64 native builds of the `astcenc` command line tool
|
||||
are now included in the prebuilt release binaries.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.6.0
|
||||
|
||||
**Status:** November 2023
|
||||
|
||||
The 4.6.0 release retunes the compressor heuristics to give improvements to
|
||||
performance for trivial losses to image quality. It also includes some minor
|
||||
bug fixes and code quality improvements.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the updated
|
||||
`astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Fixed context allocation for contexts allocated with the
|
||||
`ASTCENC_FLG_DECOMPRESS_ONLY` flag.
|
||||
* **Bug-fix:** Reduced use of `reinterpret_cast` in the core codec to
|
||||
avoid strict aliasing violations.
|
||||
* **Optimization:** `-medium` search quality no longer tests 4 partition
|
||||
encodings for block sizes between 25 and 83 texels (inclusive). This
|
||||
improves performance for a tiny drop in image quality.
|
||||
* **Optimization:** `-thorough` and higher search qualities no longer test the
|
||||
mode0 first search for block sizes between 25 and 83 texels (inclusive).
|
||||
This improves performance for a tiny drop in image quality.
|
||||
* **Optimization:** `TUNE_MAX_PARTITIONING_CANDIDATES` reduced from 32 to 8
|
||||
to reduce the size of stack allocated data structures. This causes a tiny
|
||||
drop in image quality for the `-verythorough` and `-exhaustive` presets.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.5.0
|
||||
|
||||
**Status:** June 2023
|
||||
|
||||
The 4.5.0 release is a maintenance release with small image quality
|
||||
improvements, and a number of build system quality of life improvements.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Improved handling compiler arguments in CMake, including
|
||||
consistent use of MSVC-style command line arguments for ClangCL.
|
||||
* **Bug-fix:** Invariant Clang builds now use `-ffp-model=precise` with
|
||||
`-ffp-contract=off` which is needed to restore invariance due to recent
|
||||
changes in compiler defaults.
|
||||
* **Change:** macOS binary releases are now distributed as a single universal
|
||||
binary for all platforms.
|
||||
* **Change:** Windows binary releases are now compiled with VS2022.
|
||||
* **Change:** Invariant MSVC builds for VS2022 now use `/fp:precise` instead
|
||||
of `/fp:strict`, which is is now possible because precise no longer implies
|
||||
contraction. This should improve performance for MSVC builds.
|
||||
* **Change:** Non-invariant Clang builds now use `-ffp-model=precise` with
|
||||
`-ffp-contract=on`. This should improve performance on older Clang
|
||||
versions which defaulted to no contraction.
|
||||
* **Change:** Non-invariant MSVC builds for VS2022 now use `/fp:precise`
|
||||
with `/fp:contract`. This should improve performance for MSVC builds.
|
||||
* **Change:** CMake config variables now use an `ASTCENC_` prefix to add a
|
||||
namespace and group options when the library is used in a larger project.
|
||||
* **Change:** CMake config `ASTCENC_UNIVERSAL_BUILD` for building macOS
|
||||
universal binaries has been improved to include the `x86_64h` slice for
|
||||
AVX2 builds. Universal builds are now on by default for macOS, and always
|
||||
include NEON (arm64), SSE4.1 (x86_64), and AVX2 (x86_64h) variants.
|
||||
* **Change:** CMake config `ASTCENC_NO_INVARIANCE` has been inverted to
|
||||
remove the negated option, and is now `ASTCENC_INVARIANCE` with a default
|
||||
of `ON`. Disabling this option can substantially improve performance, but
|
||||
images can different across platforms and compilers.
|
||||
* **Optimization:** Color quantization and packing for LDR RGB and RGBA has
|
||||
been vectorized to improve performance.
|
||||
* **Change:** Color quantization for LDR RGB and RGBA endpoints will now try
|
||||
multiple quantization packing methods, and pick the one with the lowest
|
||||
endpoint encoding error. This gives a minor image quality improvement, for
|
||||
no significant performance impact when combined with the vectorization
|
||||
optimizations.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.4.0
|
||||
|
||||
**Status:** March 2023
|
||||
|
||||
The 4.4.0 release is a minor release with image quality improvements, a small
|
||||
performance boost, and a few new quality-of-life features.
|
||||
|
||||
* **General:**
|
||||
* **Change:** Core library no longer checks availability of required
|
||||
instruction set extensions, such as SSE4.1 or AVX2. Checking compatibility
|
||||
is now the responsibility of the caller. See `astcenccli_entry.cpp` for
|
||||
an example of code performing this check.
|
||||
* **Change:** Core library can be built as a shared object by setting the
|
||||
`-DSHAREDLIB=ON` CMake option, resulting in e.g. `libastcenc-avx2-shared.so`.
|
||||
Note that the command line tool is always statically linked.
|
||||
* **Change:** Decompressed 3D images will now write one output file per
|
||||
slice, if the target format is a 2D image format.
|
||||
* **Change:** Command line errors print to stderr instead of stdout.
|
||||
* **Change:** Color encoding uses new quantization tables, that now factor
|
||||
in floating-point rounding if a distance tie is found when using the
|
||||
integer quant256 value. This improves image quality for 4x4 and 5x5 block
|
||||
sizes.
|
||||
* **Optimization:** Partition selection uses a simplified line calculation
|
||||
with a faster approximation. This improves performance for all block sizes.
|
||||
* **Bug-fix:** Fixed missing symbol error in decompressor-only builds.
|
||||
* **Bug-fix:** Fixed infinity handling in debug trace JSON files.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 4.3 release:**
|
||||
|
||||

|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.3.1
|
||||
|
||||
**Status:** January 2023
|
||||
|
||||
The 4.3.1 release is a minor maintenance release. No performance or image
|
||||
quality changes are expected.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Fixed typo in `-2/3/4partitioncandidatelimit` CLI options.
|
||||
* **Bug-fix:** Fixed handling for `-3/4partitionindexlimit` CLI options.
|
||||
* **Bug-fix:** Updated to `stb_image.h` v2.28, which includes multiple fixes
|
||||
and improvements for image loading.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.3.0
|
||||
|
||||
**Status:** January 2023
|
||||
|
||||
The 4.3.0 release is an optimization release. There are minor performance
|
||||
and image quality improvements in this release.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the updated
|
||||
`astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Use lower case `windows.h` include for MinGW compatibility.
|
||||
* **Change:** The `-mask` command line option, `ASTCENC_FLG_MAP_MASK` in the
|
||||
library API, has been removed.
|
||||
* **Optimization:** Always skip blue-contraction for `QUANT_256` encodings.
|
||||
This gives a small image quality improvement for the 4x4 block size.
|
||||
* **Optimization:** Always skip RGBO vector calculation for LDR encodings.
|
||||
* **Optimization:** Defer color packing and scrambling to physical layer.
|
||||
* **Optimization:** Remove folded `decimation_info` lookup tables. This
|
||||
significantly reduces compressor memory footprint and improves context
|
||||
creation time. Impact increases with the active block size.
|
||||
* **Optimization:** Increased trial and refinement pruning by using stricter
|
||||
target errors when determining whether to skip iterations.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 4.2 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.2.0
|
||||
|
||||
**Status:** November 2022
|
||||
|
||||
The 4.2.0 release is an optimization release. There are significant performance
|
||||
improvements, minor image quality improvements, and library interface changes in
|
||||
this release.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the updated
|
||||
`astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Compression for RGB and RGBA base+offset encodings no
|
||||
longer generate endpoints with the incorrect blue-contract behavior.
|
||||
* **Bug-fix:** Lowest channel correlation calculation now correctly ignores
|
||||
constant color channels for the purposes of filtering 2 plane encodings.
|
||||
On average this improves both performance and image quality.
|
||||
* **Bug-fix:** ISA compatibility now checked in `config_init()` as well as
|
||||
in `context_alloc()`.
|
||||
* **Change:** Removed the low-weight count optimization, as more recent
|
||||
changes had significantly reduced its performance benefit. Option removed
|
||||
from both command line and configuration structure.
|
||||
* **Feature:** The `-exhaustive` mode now runs full trials on more
|
||||
partitioning candidates and block candidates. This improves image quality
|
||||
by 0.1 to 0.25 dB, but slows down compression by 3x. The `-verythorough`
|
||||
and `-thorough` modes also test more candidates.
|
||||
* **Feature:** A new preset, `-verythorough`, has been introduced to provide
|
||||
a standard performance point between `-thorough` and the re-tuned
|
||||
`-exhaustive` mode. This new mode is faster and higher quality than the
|
||||
`-exhaustive` preset in the 4.1 release.
|
||||
* **Feature:** The compressor can now independently vary the number of
|
||||
partitionings considered for error estimation for 2/3/4 partitions. This
|
||||
allows heuristics to put more effort into 2 partitions, and less in to
|
||||
3/4 partitions.
|
||||
* **Feature:** The compressor can now run trials on a variable number of
|
||||
candidate partitionings, allowing high quality modes to explore more of the
|
||||
search space at the expense of slower compression. The number of trials is
|
||||
independently configurable for 2/3/4 partition cases.
|
||||
* **Optimization:** Introduce early-out threshold for 2/3/4 partition
|
||||
searches based on the results after 1 of 2 trials. This significantly
|
||||
improves performance for `-medium` and `-thorough` searches, for a minor
|
||||
loss in image quality.
|
||||
* **Optimization:** Reduce early-out threshold for 3/4 partition searches
|
||||
based on 2/3 partition results. This significantly improves performance,
|
||||
especially for `-thorough` searches, for a minor loss in image quality.
|
||||
* **Optimization:** Use direct vector compare to create a SIMD mask instead
|
||||
of a scalar compare that is broadcast to a vector mask.
|
||||
* **Optimization:** Remove obsolete partition validity masks from the
|
||||
partition selection algorithm.
|
||||
* **Optimization:** Removed obsolete channel scaling from partition
|
||||
`avgs_and_dirs()` calculation.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 4.0 and 4.1 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.1.0
|
||||
|
||||
**Status:** August 2022
|
||||
|
||||
The 4.1.0 release is a maintenance release. There is no performance or image
|
||||
quality change in this release.
|
||||
|
||||
* **General:**
|
||||
* **Change:** Command line decompressor no longer uses the legacy
|
||||
`GL_LUMINANCE` or `GL_LUMINANCE_ALPHA` format enums when writing KTX
|
||||
output files. Luminance textures now use the `GL_RED` format and
|
||||
luminance_alpha textures now use the `GL_RG` format.
|
||||
* **Change:** Command line tool gains a new `-dimage` option to generate
|
||||
diagnostic images showing aspects of the compression encoding. The output
|
||||
file name with its extension stripped is used as the stem of the diagnostic
|
||||
image file names.
|
||||
* **Bug-fix:** Library decompressor builds for SSE no longer use masked store
|
||||
`maskmovdqu` instructions, as they can generate faults on masked lanes.
|
||||
* **Bug-fix:** Command line decompressor now correctly uses sized type enums
|
||||
for the internal format when writing output KTX files.
|
||||
* **Bug-fix:** Command line compressor now correctly loads 16 and 32-bit per
|
||||
component input KTX files.
|
||||
* **Bug-fix:** Fixed GCC9 compiler warnings on Arm aarch64.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.0.0
|
||||
|
||||
**Status:** July 2022
|
||||
|
||||
The 4.0.0 release introduces some major performance enhancement, and a number
|
||||
of larger changes to the heuristics used in the codec to find a more effective
|
||||
cost:quality trade off.
|
||||
|
||||
* **General:**
|
||||
* **Change:** The `-array` option for specifying the number of image planes
|
||||
for ASTC 3D volumetric block compression been renamed to `-zdim`.
|
||||
* **Change:** The build root package directory is now `bin` instead of
|
||||
`astcenc`, allowing the CMake install step to write binaries into
|
||||
`/usr/local/bin` if the user wishes to do so.
|
||||
* **Feature:** A new `-ssw` option for specifying the shader sampling swizzle
|
||||
has been added as convenience alternative to the `-cw` option. This is
|
||||
needed to correct error weighting during compression if not all components
|
||||
are read in the shader. For example, to extract and compress two components
|
||||
from an RGBA input image, weighting the two components equally when
|
||||
sampling through .ra in the shader, use `-esw ggga -ssw ra`. In this
|
||||
example `-ssw ra` is equivalent to the alternative `-cw 1 0 0 1` encoding.
|
||||
* **Feature:** The `-a` alpha weighting option has been re-enabled in the
|
||||
backend, and now again applies alpha scaling to the RGB error metrics when
|
||||
encoding. This is based on the maximum alpha in each block, not the
|
||||
individual texel alpha values used in the earlier implementation.
|
||||
* **Feature:** The command line tool now has `-repeats <count>` for testing,
|
||||
which will iterate around compression and decompression `count` times.
|
||||
Reported performance metrics also now separate compression and
|
||||
decompression scores.
|
||||
* **Feature:** The core codec is now warning clean up to /W4 for both MSVC
|
||||
`cl.exe` and `clangcl.exe` compilers.
|
||||
* **Feature:** The core codec now supports arm64 for both MSVC `cl.exe` and
|
||||
`clangcl.exe` compilers.
|
||||
* **Feature:** `NO_INVARIANCE` builds will enable the `-ffp-contract=fast`
|
||||
option for all targets when using Clang or GCC. In addition AVX2 targets
|
||||
will also set the `-mfma` option. This reduces image quality by up to 0.2dB
|
||||
(normally much less), but improves performance by up to 5-20%.
|
||||
* **Optimization:** Angular endpoint min/max weight selection is restricted
|
||||
to weight `QUANT_11` or lower. Higher quantization levels assume default
|
||||
0-1 range, which is less accurate but much faster.
|
||||
* **Optimization:** Maximum weight quantization for later trials is selected
|
||||
based on the weight quantization of the best encoding from the 1 plane 1
|
||||
partition trial. This significantly reduces the search space for the later
|
||||
trials with more planes or partitions.
|
||||
* **Optimization:** Small data tables now use in-register SIMD permutes
|
||||
rather than gathers (AVX2) or unrolled scalar lookups (SSE/NEON). This can
|
||||
be a significant optimization for paths that are load unit limited.
|
||||
* **Optimization:** Decompressed image block writes in the decompressor now
|
||||
use a vectorized approach to writing each row of texels in the block,
|
||||
including to ability to exploit masked stores if the target supports them.
|
||||
* **Optimization:** Weight scrambling has been moved into the physical layer;
|
||||
the rest of the codec now uses linear order weights.
|
||||
* **Optimization:** Weight packing has been moved into the physical layer;
|
||||
the rest of the codec now uses unpacked weights in the 0-64 range.
|
||||
* **Optimization:** Consistently vectorize the creation of unquantized weight
|
||||
grids when they are needed.
|
||||
* **Optimization:** Remove redundant per-decimation mode copies of endpoint
|
||||
and weight structures, which were really read-only duplicates.
|
||||
* **Optimization:** Early-out the same endpoint mode color calculation if it
|
||||
cannot be applied.
|
||||
* **Optimization:** Numerous type size reductions applied to arrays to reduce
|
||||
both context working buffer size usage and stack usage.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 3.7 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2022-2024, Arm Limited and contributors. All rights reserved._
|
||||
@@ -0,0 +1,105 @@
|
||||
# 5.x series change log
|
||||
|
||||
This page summarizes the major functional and performance changes in each
|
||||
release of the 5.x series.
|
||||
|
||||
All performance data on this page is measured on an Intel Core i5-9600K
|
||||
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 5.3.0
|
||||
|
||||
**Status:** March 2025
|
||||
|
||||
The 5.3.0 release is a minor maintenance release.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** Reference C builds (`ASTCENC_ISA_NONE`) now support compiling
|
||||
for big-endian CPUs. Compile with `-DASTCENC_BIG_ENDIAN=ON` when compiling
|
||||
for a big-endian target; it is not auto-detected.
|
||||
* **Improvement:** Builds using GCC now specify `-flto=auto` to allow
|
||||
parallel link steps, and remove the log warnings about not setting a CPU
|
||||
count parameter value.
|
||||
* **Bug fix:** Builds using MSVC `cl.exe` that do not specify an explicit
|
||||
ISA using the preprocessor configuration defines will now correctly
|
||||
default to the SSE2 backend on x86-64 and the NEON backend on Arm64. Previously they would have defaulted to the reference C implementation,
|
||||
which is around 3.25 times slower.
|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 5.2.0
|
||||
|
||||
**Status:** February 2025
|
||||
|
||||
The 5.2.0 release is a minor maintenance release.
|
||||
|
||||
This release includes changes to the public interface in the `astcenc.h`
|
||||
header. We always recommend rebuilding your client-side code using the
|
||||
header from the same release to avoid compatibility issues.
|
||||
|
||||
* **General:**
|
||||
* **Change:** Changed sRGB alpha channel endpoint expansion to match the
|
||||
revised Khronos Data Format Specification (v1.4.0), which reverts an
|
||||
unintended specification change. Compared to previous releases, this change
|
||||
can cause LSB bit differences in the alpha channel of compressed images.
|
||||
* **Feature:** Arm64 builds for Linux added to the GitHub Actions builds, and
|
||||
Arm64 binaries for NEON, 128-bit SVE 128 and 256-bit SVE added to release
|
||||
builds.
|
||||
* **Feature:** Added a new codec API, `astcenc_compress_cancel()`, which can
|
||||
be used to cancel an in-flight compression. This is designed to help make
|
||||
it easier to integrate the codec into an interactive user interface that
|
||||
can respond to user events with low latency.
|
||||
* **Bug fix:** Removed incorrect `static` variable qualifier, which could
|
||||
result in an incorrect `tune_mse_overshoot` heuristic threshold being used
|
||||
if a user ran multiple concurrent compressions with different settings.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 5.1.0
|
||||
|
||||
**Status:** November 2024
|
||||
|
||||
The 5.1.0 release is an optimization release, giving moderate performance
|
||||
improvements on all platforms. There are no image quality differences.
|
||||
|
||||
* **General:**
|
||||
* **Feature:** Added a new CMake build option to control use of native
|
||||
gathers, as they can be slower than scalar loads on some common x86
|
||||
microarchitectures. Build with `-DASTCENC_X86_GATHERS=OFF` to disable use
|
||||
of native gathers in AVX2 builds.
|
||||
* **Optimization:** Added new `gather()` abstraction for gathers using byte
|
||||
indices, allowing implementations without gather hardware to skip the
|
||||
byte-to-int index conversion.
|
||||
* **Optimization:** Optimized `compute_lowest_and_highest_weight()` to
|
||||
pre-compute min/max outside of the main loop.
|
||||
* **Optimization:** Added improved intrinsics sequence for SSE and AVX2
|
||||
integer `hmin()` and `hmax()`.
|
||||
* **Optimization:** Added improved intrinsics sequence for `vint4(uint8_t*)`
|
||||
on systems implementing Arm SVE.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 5.0.0
|
||||
|
||||
**Status:** November 2024
|
||||
|
||||
The 5.0.0 release is the first stable release in the 5.x series. The main new
|
||||
feature is support for the Arm Scalable Vector Extensions (SVE) SIMD instruction
|
||||
set.
|
||||
|
||||
* **General:**
|
||||
* **Bug fix:** Fixed incorrect return type in "None" vector library
|
||||
reference implementation.
|
||||
* **Bug fix:** Fixed sincos table index under/overflow.
|
||||
* **Feature:** Changed `ASTCENC_ISA_NATIVE` builds to use `-march=native` and
|
||||
`-mcpu=native`.
|
||||
* **Feature:** Added backend for Arm SVE fixed-width 256-bit builds. These
|
||||
can only run on hardware implementing 256-bit SVE.
|
||||
* **Feature:** Added backend for Arm SVE 128-bit builds. These are portable
|
||||
builds and can run on hardware implementing any SVE vector length, but the
|
||||
explicit SVE use is augmented NEON and will only use the bottom 128-bits of
|
||||
each SVE vector.
|
||||
* **Feature:** Optimized NEON mask `any()` and `all()` functions.
|
||||
* **Feature:** Migrated build and test to GitHub Actions pipelines.
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2022-2025, Arm Limited and contributors. All rights reserved._
|
||||
|
After Width: | Height: | Size: 111 KiB |
|
After Width: | Height: | Size: 148 KiB |
|
After Width: | Height: | Size: 141 KiB |
|
After Width: | Height: | Size: 149 KiB |
|
After Width: | Height: | Size: 134 KiB |
|
After Width: | Height: | Size: 112 KiB |
|
After Width: | Height: | Size: 120 KiB |
|
After Width: | Height: | Size: 120 KiB |
|
After Width: | Height: | Size: 123 KiB |
|
After Width: | Height: | Size: 116 KiB |
|
After Width: | Height: | Size: 110 KiB |
|
After Width: | Height: | Size: 125 KiB |
|
After Width: | Height: | Size: 127 KiB |
|
After Width: | Height: | Size: 120 KiB |
|
After Width: | Height: | Size: 124 KiB |
|
After Width: | Height: | Size: 121 KiB |
|
After Width: | Height: | Size: 126 KiB |
|
After Width: | Height: | Size: 116 KiB |
|
After Width: | Height: | Size: 108 KiB |
@@ -0,0 +1,235 @@
|
||||
# Effective ASTC Encoding
|
||||
|
||||
Most texture compression schemes encode a single color format at single
|
||||
bitrate, so there are relatively few configuration options available to content
|
||||
creators beyond selecting which compressed format to use.
|
||||
|
||||
ASTC on the other hand is an extremely flexible container format which can
|
||||
compress multiple color formats at multiple bit rates. Inevitably this
|
||||
flexibility gives rise to questions about how to best use ASTC to encode a
|
||||
specific color format, or what the equivalent settings are to get a close
|
||||
match to another compression format.
|
||||
|
||||
This page aims to give some guidelines, but note that they are only guidelines
|
||||
and are not exhaustive so please deviate from them as needed.
|
||||
|
||||
## Traditional format reference
|
||||
|
||||
The most commonly used non-ASTC compressed formats, their color format, and
|
||||
their compressed bitrate are shown in the table below.
|
||||
|
||||
| Name | Color Format | Bits/Pixel | Notes |
|
||||
| -------- | ------------ | ---------- | ---------------- |
|
||||
| BC1 | RGB+A | 4 | RGB565 + 1-bit A |
|
||||
| BC3 | RGB+A | 8 | BC1 RGB + BC4 A |
|
||||
| BC3nm | G+R | 8 | BC1 G + BC4 R |
|
||||
| BC4 | R | 4 | L8 |
|
||||
| BC5 | R+G | 8 | BC1 R + BC1 G |
|
||||
| BC6H | RGB (HDR) | 8 | |
|
||||
| BC7 | RGB / RGBA | 8 | |
|
||||
| EAC_R11 | R | 4 | R11 |
|
||||
| EAC_RG11 | RG | 8 | RG11 |
|
||||
| ETC1 | RGB | 4 | RGB565 |
|
||||
| ETC2 | RGB+A | 4 | RGB565 + 1-bit A |
|
||||
| ETC2+EAC | RGB+A | 8 | RGB565 + EAC A |
|
||||
| PVRTC | RGBA | 2 or 4 | |
|
||||
|
||||
**Note:** BC2 (RGB+A) is not included in the table because it's rarely used in
|
||||
practice due to poor quality alpha encoding; BC3 is nearly always used instead.
|
||||
|
||||
**Note:** Color representations shown with a `+` symbol indicate non-correlated
|
||||
compression groups; e.g. an `RGB + A` format compresses `RGB` and `A`
|
||||
independently and does not assume the two signals are correlated. This can be
|
||||
a strength (it improves quality when compressing non-correlated signals), but
|
||||
also a weakness (it reduces quality when compressing correlated signals).
|
||||
|
||||
# ASTC Format Mapping
|
||||
|
||||
The main question which arises with the mapping of another format on to ASTC
|
||||
is how to handle cases where the input isn't a 4 component RGBA input. ASTC is
|
||||
a container format which always decompresses in to a 4 component RGBA result.
|
||||
However, the internal compressed representation is very flexible and can store
|
||||
1-4 components as needed on a per-block basis.
|
||||
|
||||
To get the best quality for a given bitrate, or the lowest bitrate for a given
|
||||
quality, it is important that as few components as possible are stored in the
|
||||
internal representation to avoid wasting coding space.
|
||||
|
||||
Specific optimizations in the ASTC coding scheme exist for:
|
||||
|
||||
* Encoding the RGB components as a single luminance component, so only a single
|
||||
value needs to be stored in the coding instead of three.
|
||||
* Encoding the A component as a constant 1.0 value, so the coding doesn't
|
||||
actually need to store a per-pixel alpha value at all.
|
||||
|
||||
... so mapping your inputs given to the compressor to hit these paths is
|
||||
really important if you want to get the best output quality for your chosen
|
||||
bitrate.
|
||||
|
||||
## Encoding 1-4 component data
|
||||
|
||||
The table below shows the recommended component usage for data with different
|
||||
numbers of color components present in the data.
|
||||
|
||||
The coding swizzle should be applied when compressing an image. This can be
|
||||
handled by the compressor when reading an uncompressed input image by
|
||||
specifying the swizzle using the `-esw` command line option.
|
||||
|
||||
The sampling swizzle is what you should use in your shader programs to read
|
||||
the data from the compressed texture, assuming no additional API-level
|
||||
component swizzling is specified by the application.
|
||||
|
||||
| Input components | ASTC Endpoint | Coding Swizzle | Sampling Swizzle |
|
||||
| -------------- | ------------- | -------------- | ------------------ |
|
||||
| 1 | L + 1 | `rrr1` | `.g` <sup>1</sup> |
|
||||
| 2 | L + A | `rrrg` | `.ga` <sup>1</sup> |
|
||||
| 3 | RGB + 1 | `rgb1` | `.rgb` |
|
||||
| 4 | RGB + A | `rgba` | `.rgba` |
|
||||
|
||||
**1:** Sampling from `g` is preferred to sampling from `r` because it allows a
|
||||
single shader to be compatible with ASTC, BC1, or ETC formats. BC1 and ETC1
|
||||
store color endpoints as RGB565 data, so the `g` component will have higher
|
||||
precision. For ASTC it doesn't actually make any difference; the same single
|
||||
component luminance will be returned for all three of the `.rgb` components.
|
||||
|
||||
## Equivalence with other formats
|
||||
|
||||
Based on these component encoding requirements we can now derive the the ASTC
|
||||
coding equivalents for most of the other texture compression formats in common
|
||||
use today.
|
||||
|
||||
| Formant | ASTC Coding Swizzle | ASTC Sampling Swizzle | Notes |
|
||||
| -------- | ------------------- | --------------------- | ---------------- |
|
||||
| BC1 | `rgba` <sup>1</sup> | `.rgba` | |
|
||||
| BC3 | `rgba` | `.rgba` | |
|
||||
| BC3nm | `gggr` | `.ag` | |
|
||||
| BC4 | `rrr1` | `.r` | |
|
||||
| BC5 | `rrrg` | `.ra` <sup>2</sup> | |
|
||||
| BC6H | `rgb1` | `.rgb` <sup>3</sup> | HDR profile only |
|
||||
| BC7 | `rgba` | `.rgba` | |
|
||||
| EAC_R11 | `rrr1` | `.r` | |
|
||||
| EAC_RG11 | `rrrg` | `.ra` <sup>2</sup> | |
|
||||
| ETC1 | `rgb1` | `.rgb` | |
|
||||
| ETC2 | `rgba` <sup>1</sup> | `.rgba` | |
|
||||
| ETC2+EAC | `rgba` | `.rgba` | |
|
||||
| ETC2+EAC | `rgba` | `.rgba` | |
|
||||
|
||||
**1:** ASTC has no equivalent of the 1-bit punch-through alpha encoding
|
||||
supported by BC1 or ETC2; if alpha is present it will be a full alpha
|
||||
component.
|
||||
|
||||
**2:** ASTC relies on using the L+A color endpoint type for coding efficiency
|
||||
for two component data. It therefore has no direct equivalent of a two-plane
|
||||
format sampled though the `.rg` components such as BC5 or EAC_RG11. This can
|
||||
be emulated by setting texture component swizzles in the runtime API - e.g. via
|
||||
`glTexParameteri()` for OpenGL ES - although it has been noted that API
|
||||
controlled swizzles are not available in WebGL.
|
||||
|
||||
**3:** ASTC can only store unsigned values, and has no equivalent of the BC6
|
||||
signed endpoint mode.
|
||||
|
||||
# Other Considerations
|
||||
|
||||
This section outlines some of the other things to consider when encoding
|
||||
textures using ASTC.
|
||||
|
||||
## Decode mode extensions
|
||||
|
||||
ASTC is specified to decompress into a 16-bit per component RGBA output by
|
||||
default, with the exception of the sRGB format which uses an 8-bit value for the
|
||||
RGB components.
|
||||
|
||||
Decompressing in to a 16-bit per component output format is often higher than
|
||||
many use cases require, especially for LDR textures which originally came from
|
||||
an 8-bit per component source image. Most implementations of ASTC support the
|
||||
decode mode extensions, which allow an application to opt-in to a lower
|
||||
precision decompressed format (RGBA8 for LDR, RGB9E5 for HDR). Using these
|
||||
extensions can improve GPU texture cache efficiency, and even improve texturing
|
||||
filtering throughput, for use cases that do not need the higher precision.
|
||||
|
||||
The ASTC format uses different data rounding rules when the decode mode
|
||||
extensions are used. To ensure that the compressor chooses the best encodings
|
||||
for the RGBA8 rounding rules, you can specify `-decode_unorm8` when compressing
|
||||
textures that will be decompressed into the RGBA8 intermediate. This gives a
|
||||
small image quality boost.
|
||||
|
||||
**Note:** This mode is automatically enabled if you use the `astcenc`
|
||||
decompressor to write an 8-bit per component output image.
|
||||
|
||||
## Encoding non-correlated components
|
||||
|
||||
Most other texture compression formats have a static component assignment in
|
||||
terms of the expected data correlation. For example, ETC2+EAC assumes that RGB
|
||||
are always correlated and that alpha is non-correlated. ASTC can automatically
|
||||
encode data as either fully correlated across all 4 components, or with any one
|
||||
component assigned to a separate non-correlated partition to the other three.
|
||||
|
||||
The non-correlated component can be changed on a block-by-block basis, so the
|
||||
compressor can dynamically adjust the coding based on the data present in the
|
||||
image. This means that there is no need for non-correlated data to be stored
|
||||
in a specific component in the input image.
|
||||
|
||||
It is however worth noting that the alpha component is treated differently to
|
||||
the RGB color components in some circumstances:
|
||||
|
||||
* When coding for sRGB the alpha component will always be stored in linear
|
||||
space.
|
||||
* When coding for HDR the alpha component can optionally be kept as LDR data.
|
||||
|
||||
## Encoding normal maps
|
||||
|
||||
The best way to store normal maps using ASTC is similar to the scheme used by
|
||||
BC5; store the X and Y components of a unit-length normal. The Z component of
|
||||
the normal can be reconstructed in shader code based on the knowledge that the
|
||||
vector is unit length.
|
||||
|
||||
To encode this we need to store only two input components in the compressed
|
||||
data, and therefore use the `rrrg` coding swizzle to align the data with the
|
||||
ASTC luminance+alpha endpoint. We can sample this in shader code using the
|
||||
`.ga` sampling swizzle, and reconstruct the Z value with:
|
||||
|
||||
vec3 nml;
|
||||
nml.xy = texture(...).ga; // Load normals (range 0 to 1)
|
||||
nml.xy = nml.xy * 2.0 - 1.0; // Unpack normals (range -1 to +1)
|
||||
nml.z = sqrt(1 - dot(nml.xy, nml.xy)); // Compute Z, given unit length
|
||||
|
||||
The encoding swizzle and appropriate component weighting is enabled by using
|
||||
the `-normal` command line option. If you wish to use a different pair of
|
||||
components you can specify a custom swizzle after setting the `-normal`
|
||||
parameter. For example, to match BC5n component ordering use
|
||||
`-normal -esw gggr` for compression and `-normal -dsw arz1` for decompression.
|
||||
|
||||
## Encoding sRGB data
|
||||
|
||||
The ASTC LDR profile can compress sRGB encoded color, which is a more
|
||||
efficient use of bits than storing linear encoded color because the gamma
|
||||
corrected value distribution more closely matches human perception of
|
||||
luminance.
|
||||
|
||||
For color data it is nearly always a perceptual quality win to use sRGB input
|
||||
source textures that are then compressed using the ASTC sRGB compression mode
|
||||
(compress using the `-cs` command line option rather than the `-cl` command
|
||||
line option). Note that sRGB gamma correction is only applied to the RGB
|
||||
components during decode; the alpha component is always treated as linear
|
||||
encoded data.
|
||||
|
||||
*Important:* The uncompressed input texture provided on the command line must
|
||||
be stored in the sRGB color space for `-cs` to function correctly.
|
||||
|
||||
## Encoding HDR data
|
||||
|
||||
HDR data can be encoded just like LDR data, but with some caveats around
|
||||
handling the alpha component.
|
||||
|
||||
For many use cases the alpha component is an actual alpha opacity component and
|
||||
is therefore used for storing an LDR value between 0 and 1. For these cases use
|
||||
the `-ch` compressor option which will treat the RGB components as HDR, but the
|
||||
A component as LDR.
|
||||
|
||||
For other use cases the alpha component is simply a fourth data component which
|
||||
is also storing an HDR value. For these cases use the `-cH` compressor option
|
||||
which will treat all components as HDR data.
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2019-2024, Arm Limited and contributors. All rights reserved._
|
||||
@@ -0,0 +1,71 @@
|
||||
# The .astc File Format
|
||||
|
||||
The default file format for compressed textures generated by `astcenc`, as well
|
||||
as from many other ASTC compressors, is the `.astc` format. This is a very
|
||||
simple format consisting of a small header followed immediately by the binary
|
||||
payload for a single image surface.
|
||||
|
||||
Header
|
||||
======
|
||||
|
||||
The header is a fixed 16 byte structure, defined as storing only bytes to avoid
|
||||
any endianness issues or incur any padding overhead.
|
||||
|
||||
```
|
||||
struct astc_header
|
||||
{
|
||||
uint8_t magic[4];
|
||||
uint8_t block_x;
|
||||
uint8_t block_y;
|
||||
uint8_t block_z;
|
||||
uint8_t dim_x[3];
|
||||
uint8_t dim_y[3];
|
||||
uint8_t dim_z[3];
|
||||
};
|
||||
```
|
||||
|
||||
Magic number
|
||||
------------
|
||||
|
||||
The 4 byte magic number at the start of the file acts as a format identifier.
|
||||
|
||||
```
|
||||
magic[0] = 0x13;
|
||||
magic[1] = 0xAB;
|
||||
magic[2] = 0xA1;
|
||||
magic[3] = 0x5C;
|
||||
```
|
||||
|
||||
Block size
|
||||
----------
|
||||
|
||||
The `block_*` fields store the ASTC block dimensions in texels. For 2D images
|
||||
the Z dimension must be set to 1.
|
||||
|
||||
Image dimensions
|
||||
----------------
|
||||
|
||||
The `dim_*` fields store the image dimensions in texels. For 2D images the
|
||||
Z dimension must be set to 1.
|
||||
|
||||
Note that the image is not required to be an exact multiple of the compressed
|
||||
block size; the compressed data may include padding that is discarded during
|
||||
decompression.
|
||||
|
||||
Each dimension is a 24 bit unsigned value that is reconstructed from the stored
|
||||
byte values as:
|
||||
|
||||
```
|
||||
decoded_dim = dim[0] + (dim[1] << 8) + (dim[2] << 16);
|
||||
```
|
||||
|
||||
Binary payload
|
||||
==============
|
||||
|
||||
The binary payload is a byte stream that immediately follows the header. It
|
||||
contains 16 bytes per compressed block. The number of compressed blocks is
|
||||
determined from the header information.
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2020-2022, Arm Limited and contributors. All rights reserved._
|
||||
@@ -0,0 +1,488 @@
|
||||
# ASTC Format Overview
|
||||
|
||||
Adaptive Scalable Texture Compression (ASTC) is an advanced lossy texture
|
||||
compression technology developed by Arm and AMD. It has been adopted as an
|
||||
official Khronos extension to the OpenGL and OpenGL ES APIs, and as a standard
|
||||
optional feature for the Vulkan API.
|
||||
|
||||
ASTC offers a number of advantages over earlier texture compression formats:
|
||||
|
||||
* **Format flexibility:** ASTC supports compressing between 1 and 4 channels of
|
||||
data, including support for one non-correlated channel such as RGB+A
|
||||
(correlated RGB, non-correlated alpha).
|
||||
* **Bit rate flexibility:** ASTC supports compressing images with a fine
|
||||
grained choice of bit rates between 0.89 and 8 bits per texel (bpt). The bit
|
||||
rate choice is independent to the color format choice.
|
||||
* **Advanced format support:** ASTC supports compressing images in either low
|
||||
dynamic range (LDR), LDR sRGB, or high dynamic range (HDR) color spaces, as
|
||||
well as support for compressing 3D volumetric textures.
|
||||
* **Improved image quality:** Despite the high degree of format flexibility,
|
||||
ASTC manages to beat nearly all legacy texture compression formats -- such as
|
||||
ETC2, PVRCT, and the BC formats -- on image quality at equivalent bit
|
||||
rates.
|
||||
|
||||
This article explores the ASTC format, and how it manages to generate the
|
||||
flexibility and quality improvements that it achieves.
|
||||
|
||||
|
||||
Why ASTC?
|
||||
=========
|
||||
|
||||
Before the creation of ASTC, the format and bit rate coverage of the available
|
||||
formats was very sparse:
|
||||
|
||||

|
||||
|
||||
In reality the situation is even worse than this diagram shows, as many of
|
||||
these formats are proprietary or simply not available on some operating
|
||||
systems, so any single platform will have very limited compression choices.
|
||||
|
||||
For developers this situation makes developing content which is portable across
|
||||
multiple platforms a tricky proposition. It's almost certain that differently
|
||||
compressed assets will be needed for different platforms. Each asset pack would
|
||||
likely then need to use different levels of compression, and may even have to
|
||||
fall back to no compression for some assets on some platforms, which leaves
|
||||
either some image quality or some memory bandwidth efficiency untapped.
|
||||
|
||||
It was clear a better way was needed, so the Khronos group asked members to
|
||||
submit proposals for a new compression algorithm to be adopted in the same
|
||||
manner that the earlier ETC algorithm was adopted for OpenGL ES. ASTC was the
|
||||
result of this, and has been adopted as an official algorithm for OpenGL,
|
||||
OpenGL ES, and Vulkan.
|
||||
|
||||
|
||||
Format overview
|
||||
===============
|
||||
|
||||
Given the fragmentation issues with the existing compression formats, it should
|
||||
be no surprise that the high level design objectives for ASTC were to have
|
||||
something which could be used across the whole range of art assets found in
|
||||
modern content, and which allows artists to have more control over the quality
|
||||
to bit rate tradeoff.
|
||||
|
||||
There are quite a few technical components which make up the ASTC format, so
|
||||
before we dive into detail it will be useful to give an overview of how ASTC
|
||||
works at a higher level.
|
||||
|
||||
|
||||
Block compression
|
||||
-----------------
|
||||
|
||||
Compression formats for real-time graphics need the ability to quickly and
|
||||
efficiently make random samples into a texture. This places two technical
|
||||
requirements on any compression format:
|
||||
|
||||
* It must be possible to compute the address of data in memory given only a
|
||||
sample coordinate.
|
||||
* It must be possible to decompress random samples without decompressing too
|
||||
much surrounding data.
|
||||
|
||||
The standard solution for this used by all contemporary real-time formats,
|
||||
including ASTC, is to divide the image into fixed-size blocks of texels, each
|
||||
of which is compressed into a fixed number of output bits. This feature makes
|
||||
it possible to access texels quickly, in any order, and with a well-bounded
|
||||
decompression cost.
|
||||
|
||||
The 2D block footprints in ASTC range from 4x4 texels up to 12x12 texels, which
|
||||
all compress into 128-bit output blocks. By dividing 128 bits by the number of
|
||||
texels in the footprint, we derive the format bit rates which range from 8 bpt
|
||||
(`128/(4*4)`) down to 0.89 bpt (`128/(12*12)`).
|
||||
|
||||
|
||||
Color encoding
|
||||
--------------
|
||||
|
||||
ASTC uses gradients to assign the color values of each texel. Each compressed
|
||||
block stores the end-point colors for a gradient, and an interpolation weight
|
||||
for each texel which defines the texel's location along that gradient. During
|
||||
decompression the color value for each texel is generated by interpolating
|
||||
between the two end-point colors, based on the per-texel weight.
|
||||
|
||||

|
||||
|
||||
In many cases a block will contain a complex distribution of colors, for
|
||||
example a red ball sitting on green grass. In these scenarios a single color
|
||||
gradient will not be able to accurately represent all of the texels' values. To
|
||||
support this ASTC allows a block to define up to four distinct color gradients,
|
||||
known as partitions, and can assign each texel to a single partition. For our
|
||||
example we require two partitions, one for our ball texels and one for our
|
||||
grass texels.
|
||||
|
||||

|
||||
|
||||
Now that you know the high level operation of the format, we can dive into more
|
||||
detail.
|
||||
|
||||
|
||||
Integer encoding
|
||||
================
|
||||
|
||||
Initially the idea of fractional bits per texel sounds implausible, or even
|
||||
impossible, because we're so used to storing numbers as a whole number of bits.
|
||||
However, it's not quite as strange as it sounds. ASTC uses an encoding
|
||||
technique called Bounded Integer Sequence Encoding (BISE), which makes heavy
|
||||
use of storing numbers with a fractional number of bits to pack information
|
||||
more efficiently.
|
||||
|
||||
|
||||
Storing alphabets
|
||||
-----------------
|
||||
|
||||
Even though color and weight values per texel are notionally floating-point
|
||||
values, we have far too few bits available to directly store the actual values,
|
||||
so they must be quantized during compression to reduce the storage size. For
|
||||
example, if we have a floating-point weight for each texel in the range 0.0 to
|
||||
1.0 we could choose to quantize it to five values - 0.0, 0.25, 0.5, 0.75, and
|
||||
1.0 - which we can then represent in storage using the integer values 0 to 4.
|
||||
|
||||
In the general case we need to be able to efficiently store characters of an
|
||||
alphabet containing N symbols if we choose quantize to N levels. An N symbol
|
||||
alphabet contains `log2(N)` bits of information per character. If we have an
|
||||
alphabet of 5 possible symbols then each character contains ~2.32 bits of
|
||||
information, but simple binary storage would require us to round up to 3 bits.
|
||||
This wastes 22.3% of our storage capacity. The chart below shows the percentage
|
||||
of our bit-space wasted when using simple binary encoding to store an arbitrary
|
||||
N symbol alphabet:
|
||||
|
||||

|
||||
|
||||
... which shows for most alphabet sizes we waste a lot of our storage capacity
|
||||
when using an integer number of bits per character. Efficiency is of critical
|
||||
importance to a compression format, so this is something we needed to be able
|
||||
to improve.
|
||||
|
||||
**Note:** We could have chosen to round-up the quantization level to the next
|
||||
power of two, and at least use the bits we're spending. However, this forces
|
||||
the encoder to spend bits which could be used elsewhere for a bigger benefit,
|
||||
so it will reduce image quality and is a sub-optimal solution.
|
||||
|
||||
|
||||
Quints
|
||||
------
|
||||
|
||||
Instead of rounding up a 5 symbol alphabet - called a "quint" in BISE - to
|
||||
three bits, we could choose to instead pack three quint characters together.
|
||||
Three characters in a 5-symbol alphabet have 5<sup>3</sup> (125) combinations,
|
||||
and contain 6.97 bits of information. We can store this in 7 bits and have a
|
||||
storage waste of only 0.5%.
|
||||
|
||||
|
||||
Trits
|
||||
-----
|
||||
|
||||
We can similarly construct a 3-symbol alphabet - called a "trit" in BISE - and
|
||||
pack trit characters in groups of five. Each character group has 3<sup>5</sup>
|
||||
(243) combinations, and contains 7.92 bits of information. We can store this in
|
||||
8 bits and have a storage waste of only 1%.
|
||||
|
||||
|
||||
BISE
|
||||
----
|
||||
|
||||
The BISE encoding used by ASTC allows storage of character sequences using
|
||||
arbitrary alphabets of up to 256 symbols, encoding each alphabet size in the
|
||||
most space-efficient choice of bits, trits, and quints.
|
||||
|
||||
* Alphabets with up to (2<sup>n</sup> - 1) symbols can be encoded using n bits
|
||||
per character.
|
||||
* Alphabets with up (3 * 2<sup>n</sup> - 1) symbols can be encoded using n bits
|
||||
(m) and a trit (t) per character, and reconstructed using the equation
|
||||
(t * 2<sup>n</sup> + m).
|
||||
* Alphabets with up to (5 * 2<sup>n</sup> - 1) symbols can be encoded using n
|
||||
bits (m) and a quint (q) per character, and reconstructed using the equation
|
||||
(q * 2<sup>n</sup> + m).
|
||||
|
||||
When the number of characters in a sequence is not a multiple of three or five
|
||||
we need to avoid wasting storage at the end of the sequence, so we add another
|
||||
constraint on the encoding. If the last few values in the sequence to encode
|
||||
are zero, the last few bits in the encoded bit string must also be zero.
|
||||
Ideally, the number of non-zero bits should be easily calculated and not depend
|
||||
on the magnitudes of the previous encoded values. This is a little tricky to
|
||||
arrange during compression, but it is possible. This means that we do not need
|
||||
to store any padding after the end of the bit sequence, as we can safely assume
|
||||
that they are zero bits.
|
||||
|
||||
With this constraint in place - and by some smart packing the bits, trits, and
|
||||
quints - BISE encodes an string of S characters in an N symbol alphabet using a
|
||||
fixed number of bits:
|
||||
|
||||
* S values up to (2<sup>n</sup> - 1) uses (NS) bits.
|
||||
* S values up to (3 * 2<sup>n</sup> - 1) uses (NS + ceil(8S / 5)) bits.
|
||||
* S values up to (5 * 2<sup>n</sup> - 1) uses (NS + ceil(7S / 3)) bits.
|
||||
|
||||
... and the compressor will choose the one of these which produces the smallest
|
||||
storage for the alphabet size being stored; some will use binary, some will use
|
||||
bits and a trit, and some will use bits and a quint. If we compare the storage
|
||||
efficiency of BISE against simple binary for the range of possible alphabet
|
||||
sizes we might want to encode we can see that it is much more efficient.
|
||||
|
||||

|
||||
|
||||
|
||||
Block sizes
|
||||
===========
|
||||
|
||||
ASTC always compresses blocks of texels into 128-bit outputs, but allows the
|
||||
developer to select from a range of block sizes to enable a fine-grained
|
||||
tradeoff between image quality and size.
|
||||
|
||||
| Block footprint | Bits/texel | | Block footprint | Bits/texel |
|
||||
| --------------- | ---------- | --- | --------------- | ---------- |
|
||||
| 4x4 | 8.00 | | 10x5 | 2.56 |
|
||||
| 5x4 | 6.40 | | 10x6 | 2.13 |
|
||||
| 5x5 | 5.12 | | 8x8 | 2.00 |
|
||||
| 6x5 | 4.27 | | 10x8 | 1.60 |
|
||||
| 6x6 | 3.56 | | 10x10 | 1.28 |
|
||||
| 8x5 | 3.20 | | 12x10 | 1.07 |
|
||||
| 8x6 | 2.67 | | 12x12 | 0.89 |
|
||||
|
||||
|
||||
|
||||
Color endpoints
|
||||
===============
|
||||
|
||||
The color data for a block is encoded as a gradient between two color
|
||||
endpoints, with each texel selecting a position along that gradient which is
|
||||
then interpolated during decompression. ASTC supports 16 color endpoint
|
||||
encoding schemes, known as "endpoint modes". Options for endpoint modes
|
||||
include:
|
||||
|
||||
* Varying the number of color channels: e.g. luminance, luminance + alpha, rgb,
|
||||
and rgba.
|
||||
* Varying the encoding method: e.g. direct, base+offset, base+scale,
|
||||
quantization level.
|
||||
* Varying the data range: e.g. low dynamic range, or high dynamic range
|
||||
|
||||
The endpoint modes, and the endpoint color BISE quantization level, can be
|
||||
chosen on a per-block basis.
|
||||
|
||||
|
||||
Color partitions
|
||||
================
|
||||
|
||||
Colors within a block are often complex, and cannot be accurately captured by a
|
||||
single color gradient, as discussed earlier with our example of a red ball
|
||||
lying on green grass. ASTC allows up to four color gradients - known as
|
||||
"partitions" - to be assigned to a single block. Each texel is then assigned to
|
||||
a single partition for the purposes of decompression.
|
||||
|
||||
Rather then directly storing the partition assignment for each texel, which
|
||||
would need a lot of decompressor hardware to store it for all block sizes, we
|
||||
generate it procedurally. Each block only needs to store the partition index -
|
||||
which is the seed for the procedural generator - and the per texel assignment
|
||||
can then be generated on-the-fly during decompression. The image below shows
|
||||
the generated texel assignments for two (top), three (middle), and four
|
||||
(bottom) partitions for the 8x8 block size.
|
||||
|
||||

|
||||
|
||||
The number of partitions and the partition index can be chosen on a per-block
|
||||
basis, and a different color endpoint mode can be chosen per partition.
|
||||
|
||||
**Note:** ASTC uses a 10-bit seed to drive the partition assignments. The hash
|
||||
used will introduce horizontal bias in a third of the partitions, vertical bias
|
||||
in a third, and no bias in the rest. As they are procedurally generated not all
|
||||
of the partitions are useful, in particular with the smaller block sizes.
|
||||
|
||||
* Many partitions are duplicates.
|
||||
* Many partitions are degenerate (an N partition hash results in at least one
|
||||
partition assignment that contains no texels).
|
||||
|
||||
|
||||
Texel weights
|
||||
=============
|
||||
|
||||
Each texel requires a weight, which defines the relative contribution of each
|
||||
color endpoint when interpolating the color gradient.
|
||||
|
||||
For smaller block sizes we can choose to store the weight directly, with one
|
||||
weight per texel, but for the larger block sizes we simply do not have enough
|
||||
bits of storage to do this. To work around this ASTC allows the weight grid to
|
||||
be stored at a lower resolution than the texel grid. The per-texel weights are
|
||||
interpolated from the stored weight grid during decompression using a bilinear
|
||||
interpolation.
|
||||
|
||||
The number of texel weights, and the weight value BISE quantization level, can
|
||||
be chosen on a per-block basis.
|
||||
|
||||
|
||||
Dual-plane weights
|
||||
------------------
|
||||
|
||||
Using a single weight for all color channels works well when there is good
|
||||
correlation across the channels, but this is not always the case. Common
|
||||
examples where we would expect to get low correlation at least some of the time
|
||||
are textures storing RGBA data - alpha masks are not usually closely
|
||||
correlated with the color value - or normal data - the X and Y normal values
|
||||
often change independently.
|
||||
|
||||
ASTC allows a dual-plane mode, which uses two separate weight grids for each
|
||||
texel. A single channel can be assigned to a second plane of weights, while
|
||||
the other three use the first plane of weights.
|
||||
|
||||
The use of dual-plane mode can be chosen on a per-block basis, but its use
|
||||
prevents the use of four color partitions as we do not have enough bits to
|
||||
concurrently store both an extra plane of weights and an extra set of color
|
||||
endpoints.
|
||||
|
||||
|
||||
End results
|
||||
===========
|
||||
|
||||
So, if we pull all of this together what do we end up with?
|
||||
|
||||
|
||||
Adaptive
|
||||
--------
|
||||
|
||||
The first word in the name of ASTC is "adaptive", and it should now hopefully
|
||||
be clear why. Each block always compresses into 128-bits of storage, but the
|
||||
developer can choose from a wide range of texel block sizes and the compressor
|
||||
gets a huge amount of latitude to determine how those 128 bits are used.
|
||||
|
||||
The compressor can trade off the number of bits assigned to colors (number of
|
||||
partitions, endpoint mode, and stored quantization level) and weights (number
|
||||
of weights per block, use of dual-plane, and stored quantization level) on a
|
||||
per-block basis to get the best image quality possible.
|
||||
|
||||

|
||||
|
||||
|
||||
Format support
|
||||
--------------
|
||||
|
||||
The compression scheme used by ASTC effectively compresses arbitrary sequences
|
||||
of floating point numbers, with a flexible number of channels, across any of
|
||||
the supported block sizes. There is no real notion of "color format" in the
|
||||
format itself at all, beyond the color endpoint mode selection, although a
|
||||
sensible compressor will want to use some format-specific heuristics to drive
|
||||
an efficient state-space search.
|
||||
|
||||
The orthogonal encoding design allows ASTC to provide almost complete coverage
|
||||
of our desirable format matrix from earlier, across a wide range of bit rates:
|
||||
|
||||

|
||||
|
||||
The only significant omission is the absence of a dedicated two channel
|
||||
encoding for HDR textures. We simply ran out of entries in the space we had for
|
||||
encoding color endpoint modes, and this one didn't make the cut.
|
||||
|
||||
The flexibility allowed by ASTC ticks the requirement that almost any asset can
|
||||
be compressed to some degree, at an appropriate bitrate for its quality needs.
|
||||
This is a powerful enabler for a compression format, because it puts control in
|
||||
the hands of content creators and not arbitrary format restrictions.
|
||||
|
||||
|
||||
Image quality
|
||||
-------------
|
||||
|
||||
The normal expectation would be that this level of format flexibility would
|
||||
come at a cost of image quality; it has to cost something, right? Luckily this
|
||||
isn't true. The high packing efficiency allowed by BISE encoding, and the
|
||||
ability to dynamically choose where to spend encoding space on a per-block
|
||||
basis, means that an ASTC compressor is not forced to spend bits on things that
|
||||
don't help image quality.
|
||||
|
||||
This gives some significant improvements in image quality compared to the older
|
||||
texture formats, even though ASTC also handles a much wider range of options.
|
||||
|
||||
* ASTC at 2 bpt outperforms PVRTC at 2 bpt by ~2.0dB.
|
||||
* ASTC at 3.56 bpt outperforms PVRTC and BC1 at 4 bpt by ~1.5dB, and ETC2 by
|
||||
~0.7dB, despite a 10% bit rate disadvantage.
|
||||
* ASTC at 8 bpt for LDR formats is comparable in quality to BC7 at 8 bpt.
|
||||
* ASTC at 8 bpt for HDR formats is comparable in quality to BC6H at 8 bpt.
|
||||
|
||||
Differences as small as 0.25dB are visible to the human eye, and remember that
|
||||
dB uses a logarithmic scale, so these are significant image quality
|
||||
improvements.
|
||||
|
||||
|
||||
3D compression
|
||||
--------------
|
||||
|
||||
One of the nice bonus features of ASTC is that the techniques which underpin
|
||||
the format generalize to compressing volumetric texture data without needing
|
||||
very much additional decompression hardware.
|
||||
|
||||
ASTC is therefore also able to optionally support compression of 3D textures,
|
||||
which is a unique feature not found in any earlier format, at the following
|
||||
bit rates:
|
||||
|
||||
| Block footprint | Bits/texel | | Block footprint | Bits/texel |
|
||||
| --------------- | ---------- | --- | --------------- | ---------- |
|
||||
| 3x3x3 | 4.74 | | 5x5x4 | 1.28 |
|
||||
| 4x3x3 | 3.56 | | 5x5x5 | 1.02 |
|
||||
| 4x4x3 | 2.67 | | 6x5x5 | 0.85 |
|
||||
| 4x4x4 | 2.00 | | 6x6x5 | 0.71 |
|
||||
| 5x4x4 | 1.60 | | 6x6x6 | 0.59 |
|
||||
|
||||
|
||||
Availability
|
||||
============
|
||||
|
||||
The ASTC functionality is specified as a set of feature profiles, allowing
|
||||
GPU hardware manufacturers to select which parts of the standard they
|
||||
implement. There are four commonly seen profiles:
|
||||
|
||||
* "LDR":
|
||||
* 2D blocks.
|
||||
* LDR and sRGB color space.
|
||||
* [KHR_texture_compression_astc_ldr][astc_ldr]: KHR OpenGL ES extension.
|
||||
* "LDR + Sliced 3D":
|
||||
* 2D blocks and sliced 3D blocks.
|
||||
* LDR and sRGB color space.
|
||||
* [KHR_texture_compression_astc_sliced_3d][astc_3d]: KHR OpenGL ES extension.
|
||||
* "HDR":
|
||||
* 2D and sliced 3D blocks.
|
||||
* LDR, sRGB, and HDR color spaces.
|
||||
* [KHR_texture_compression_astc_hdr][astc_ldr]: KHR OpenGL ES extension.
|
||||
* "Full":
|
||||
* 2D, sliced 3D, and volumetric 3D blocks.
|
||||
* LDR, sRGB, and HDR color spaces.
|
||||
* [OES_texture_compression_astc][astc_full]: OES OpenGL ES extension.
|
||||
|
||||
The LDR profile is mandatory in OpenGL ES 3.2 and a standardized optional
|
||||
feature for Vulkan, and therefore widely supported on contemporary mobile
|
||||
devices. The 2D HDR profile is not mandatory, but is widely supported.
|
||||
|
||||
3D texturing
|
||||
------------
|
||||
|
||||
The APIs expose 3D textures in two flavors.
|
||||
|
||||
The sliced 3D texture support builds a 3D texture from an array of 2D image
|
||||
slices that have each been individually compressed using 2D ASTC compression.
|
||||
This is required for the HDR profile, so is also widely supported.
|
||||
|
||||
The volumetric 3D texture support uses the native 3D block sizes provided by
|
||||
ASTC to implement true volumetric compression. This enables a wider choice of
|
||||
low bitrate options than the 2D blocks, which is particularly important for 3D
|
||||
textures of any non-trivial size. Volumetric formats are not widely supported,
|
||||
but are supported on all of the Arm Mali GPUs that support ASTC.
|
||||
|
||||
ASTC decode mode
|
||||
----------------
|
||||
|
||||
ASTC is specified to decompress texels into fp16 intermediate values, except
|
||||
for sRGB which always decompresses into 8-bit UNORM intermediates. For many use
|
||||
cases this gives more dynamic range and precision than required. This can cause
|
||||
a reduction in both texture cache efficiency and texture filtering performance
|
||||
due to the larger decompressed data size.
|
||||
|
||||
A pair of extensions exist, and are widely supported on recent mobile GPUs,
|
||||
which allow applications to reduce the intermediate precision to either UNORM8
|
||||
(recommended for LDR textures) or RGB9e5 (recommended for HDR textures).
|
||||
|
||||
* [OES_texture_compression_astc_decode_mode][astc_decode]: Allow UNORM8
|
||||
intermediates
|
||||
* [OES_texture_compression_astc_decode_mode_rgb9e5][astc_decode]: Allow RGB9e5
|
||||
intermediates
|
||||
|
||||
[astc_ldr]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_hdr.txt
|
||||
[astc_3d]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_sliced_3d.txt
|
||||
[astc_full]: https://www.khronos.org/registry/OpenGL/extensions/OES/OES_texture_compression_astc.txt
|
||||
[astc_decode]: https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_texture_compression_astc_decode_mode.txt
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2019-2022, Arm Limited and contributors. All rights reserved._
|
||||
|
After Width: | Height: | Size: 115 KiB |
|
After Width: | Height: | Size: 23 KiB |
|
After Width: | Height: | Size: 29 KiB |
|
After Width: | Height: | Size: 122 KiB |
|
After Width: | Height: | Size: 76 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 79 KiB |
|
After Width: | Height: | Size: 47 KiB |
@@ -0,0 +1,79 @@
|
||||
# Terminology for the ASTC Encoder
|
||||
|
||||
Like most software, the `astcenc` code base has a set of naming conventions
|
||||
for variables which are used to ensure both accuracy and reasonable brevity.
|
||||
|
||||
:construction: These conventions are being used for new patches, so new code
|
||||
will conform to this, but older code is still being cleaned up to follow
|
||||
these conventions.
|
||||
|
||||
## Counts
|
||||
|
||||
For counts of things prefer `<x>_count` rather than `<x>s`. For example:
|
||||
|
||||
* `plane_count`
|
||||
* `weight_count`
|
||||
* `texel_count`
|
||||
|
||||
Where possible aim for descriptive loop variables, as these are more literate
|
||||
than simple `i` or `j` variables. For example:
|
||||
|
||||
* `plane_index`
|
||||
* `weight_index`
|
||||
* `texel_index`
|
||||
|
||||
## Ideal, Unpacked Quantized, vs Packed Quantized
|
||||
|
||||
Variables that are quantized, such as endpoint colors and weights, have
|
||||
multiple states depending on how they are being used.
|
||||
|
||||
**Ideal values** represent arbitrary numeric values that can take any value.
|
||||
These are often used during compression to work out the best value before
|
||||
any quantization is applied. For example, integer weights in the 0-64 range can
|
||||
take any of the 65 values available.
|
||||
|
||||
**Quant uvalues** represent the unpacked numeric value after any quantization
|
||||
rounding has been applied. These are often used during compression to work out
|
||||
the error for the quantized value compared to the ideal value. For example,
|
||||
`QUANT_3` weights in the 0-64 range can only take one of `[0, 32, 64]`.
|
||||
|
||||
**Quant pvalues** represent the packed numeric value in the quantized alphabet.
|
||||
This is what ends up encoded in the ASTC data, although note that the encoded
|
||||
ordering is scrambled to simplify hardware. For example, `QUANT_3` weights
|
||||
originally in the 0-64 range can only take one of `[0, 1, 2]`.
|
||||
|
||||
For example:
|
||||
|
||||
* `weights_ideal_value`
|
||||
* `weights_quant_uvalue`
|
||||
* `weights_quant_pvalue`
|
||||
|
||||
## Full vs Decimated interpolation weights
|
||||
|
||||
Weight grids have multiple states depending on how they are being used.
|
||||
|
||||
**full_weights** represent per texel weight grids, storing one weight per texel.
|
||||
|
||||
**decimated_weights** represent reduced weight grids, which can store fewer
|
||||
weights and which are bilinear interpolated to generate the full weight grid.
|
||||
|
||||
Full weights have no variable prefix,but decimated weights are stored with
|
||||
a `dec_` prefix.
|
||||
|
||||
* `dec_weights_ideal_value`
|
||||
* `dec_weights_quant_uvalue`
|
||||
* `dec_weights_quant_pvalue`
|
||||
|
||||
## Weight vs Significance
|
||||
|
||||
The original encoder used "weight" for multiple purposes - texel significance
|
||||
(weight the error), color channel significance (weight the error), and endpoint
|
||||
interpolation weights. This gets very confusing in functions using all three!
|
||||
|
||||
We are slowly refactoring the code to only use "weight" to mean the endpoint
|
||||
interpolation weights. The error weighting factors used for other purposes are
|
||||
being updated to use the using the term "significance".
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2020-2022, Arm Limited and contributors. All rights reserved._
|
||||
@@ -0,0 +1,120 @@
|
||||
# Testing astcenc
|
||||
|
||||
The repository contains a small suite of tests which can be used to sanity
|
||||
check source code changes to the compressor. It must be noted that this test
|
||||
suite is relatively limited in scope and does not cover every feature or
|
||||
bitrate of the standard.
|
||||
|
||||
# Required software
|
||||
|
||||
Running the tests requires Python 3.7 to be installed on the host machine, and
|
||||
an `astcenc-avx2` release build to have been previously compiled and installed
|
||||
into an directory called `astcenc` in the root of the git checkout. This
|
||||
can be achieved by configuring the CMake build using the install prefix
|
||||
`-DCMAKE_INSTALL_PREFIX=../` and then running a build with the `install` build
|
||||
target.
|
||||
|
||||
# Running C++ unit tests
|
||||
|
||||
We support a small (but growing) number of C++ unit tests, which are written
|
||||
using the `googletest` framework and integrated in the CMake "CTest" test
|
||||
framework.
|
||||
|
||||
To build unit tests pull the `googletest` git submodule and add
|
||||
`-DASTCENC_UNITTEST=ON` to the CMake command line when configuring.
|
||||
|
||||
To run unit tests use the CMake `ctest` utility from your build directory after
|
||||
you have built the tests.
|
||||
|
||||
```shell
|
||||
cd build
|
||||
ctest --verbose
|
||||
```
|
||||
|
||||
# Running command line tests
|
||||
|
||||
To run the command line tests, which aim to get coverage of the command line
|
||||
options and core codec stability without testing the compression quality
|
||||
itself, run the command line:
|
||||
|
||||
python3 -m unittest discover -s Test -p astc_test*.py -v
|
||||
|
||||
# Running image tests
|
||||
|
||||
To run the image test suite run the following command from the root directory
|
||||
of the repository:
|
||||
|
||||
python3 ./Test/astc_test_image.py
|
||||
|
||||
This will run though a series of image compression tests, comparing the image
|
||||
PSNR against a set of reference results from the last stable baseline. The test
|
||||
will fail if any reduction in PSNR above a set threshold is detected. Note that
|
||||
performance information is reported, but regressions will not flag a failure.
|
||||
|
||||
For debug purposes, all decompressed test output images and result CSV files
|
||||
are stored in the `TestOutput` directory, using the same test set structure as
|
||||
the `Test/Images` folder.
|
||||
|
||||
## Test selection
|
||||
|
||||
The runner supports a number of options to filter down what is run, enabling
|
||||
developers to focus local testing on the parts of the code they are working on.
|
||||
|
||||
* `--encoder` selects which encoder to run. By default the `avx2` encoder is
|
||||
selected. Note that some out-of-tree reference encoders (older encoders, and
|
||||
some third-party encoders) are supported for comparison purposes. These will
|
||||
not work without the binaries being manually provided; they are not
|
||||
distributed here.
|
||||
* `--test-set` selects which image set to run. By default the `Small` image
|
||||
test set is selected, which aims to provide basic coverage of many different
|
||||
color formats and color profiles.
|
||||
* `--block-size` selects which block size to run. By default a range of
|
||||
block sizes (2D and 3D) are used.
|
||||
* `--color-profile` selects which color profiles from the standard should be
|
||||
used (LDR, LDR sRGB, or HDR) to select images. By default all are selected.
|
||||
* `--color-format` selects which color formats should be used (L, XY, RGB,
|
||||
RGBA) to select images. By default all are selected.
|
||||
|
||||
## Performance tests
|
||||
|
||||
To provide less noisy performance results the test suite supports compressing
|
||||
each image multiple times and returning the best measured performance. To
|
||||
enable this mode use the following options:
|
||||
|
||||
* `--repeats <M>` : Run M test compression passes which are timed.
|
||||
|
||||
**Note:** The reference CSV contains performance results measured on an Intel
|
||||
Core i5 9600K running at 4.3GHz, running each test 5 times.
|
||||
|
||||
## Updating reference data
|
||||
|
||||
The reference PSNR and performance scores are stored in CSVs committed to the
|
||||
repository. This data is created by running the tests using the last stable
|
||||
release on a standard test machine we use for performance testing builds.
|
||||
|
||||
It can be useful for developers to rebuild the reference results for their
|
||||
local machine, in particular for measuring performance improvements. To build
|
||||
new reference CSVs, download the current reference `astcenc` binary (1.7) from
|
||||
GitHub for your host OS and place it in to the `./Binaries/1.7/` directory.
|
||||
Once this is done, run the command:
|
||||
|
||||
python3 ./Test/astc_test_image.py --encoder 1.7 --test-set all --repeats 5
|
||||
|
||||
... to regenerate the reference CSV files.
|
||||
|
||||
**WARNING:** This can take some hours to complete, and it is best done when the
|
||||
test suite gets exclusive use of the machine to avoid other processing slowing
|
||||
down the compression and disturbing the performance data. It is recommended to
|
||||
shutdown or disable any background applications that are running.
|
||||
|
||||
## Valgrind memcheck
|
||||
|
||||
It is always worth running the Valgrind memcheck tool to validate that we have
|
||||
not introduced any obvious memory errors. Build a release build with symbols
|
||||
information with `-DCMAKE_BUILD_TYPE=RelWithDebInfo` and then run:
|
||||
|
||||
valgrind --tool=memcheck --track-origins=yes <command>
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2019-2022, Arm Limited and contributors. All rights reserved._
|
||||