Add ktx
This commit is contained in:
+416
@@ -0,0 +1,416 @@
|
||||
# 4.x series change log
|
||||
|
||||
This page summarizes the major functional and performance changes in each
|
||||
release of the 4.x series.
|
||||
|
||||
All performance data on this page is measured on an Intel Core i5-9600K
|
||||
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.8.0
|
||||
|
||||
**Status:** May 2024
|
||||
|
||||
The 4.8.0 release is a minor maintenance release.
|
||||
|
||||
* **General:**
|
||||
* **Bug fix:** Native builds on macOS will now correctly build for arm64 when
|
||||
run outside of Rosetta on an Apple silicon device.
|
||||
* **Bug fix:** Multiple small improvements to remove use of undefined
|
||||
language behavior, to improve support for deployment using Emscripten.
|
||||
* **Feature:** Builds using Clang can now build with undefined behavior
|
||||
sanitizer by setting `-DASTCENC_UBSAN=ON` on the CMake configure line.
|
||||
* **Feature:** Updated to Wuffs library 0.3.4, which ignores tRNS alpha
|
||||
chunks for type 4 (LA) and 6 (RGBA) PNGs, to improve compatibility with
|
||||
libpng.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.7.0
|
||||
|
||||
**Status:** January 2024
|
||||
|
||||
The 4.7.0 release is a major maintenance release, fixing rounding behavior in
|
||||
the decompressor to match the Khronos specification. This fix includes the
|
||||
addition of explicit support for optimizing for `decode_unorm8` rounding.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the
|
||||
updated `astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug fix:** sRGB LDR decompression now uses the correct endpoint expansion
|
||||
method to create the 16-bit RGB endpoint colors, and removes the previous
|
||||
correction code from the interpolation function. This bug could result in
|
||||
LSB bit flips relative to the standard specification.
|
||||
* **Bug fix:** Decompressing to an 8-bit per component output image now
|
||||
matches the `decode_unorm8` extension rounding rules. This bug could result
|
||||
in LSB bit flips relative to the standard specification.
|
||||
* **Bug fix:** Code now avoids using `alignas()` in the reference C
|
||||
implementation, as the default `alignas(16)` is narrower than the
|
||||
native minimum alignment requirement on some CPUs.
|
||||
* **Feature:** Library configuration supports a new flag,
|
||||
`ASTCENC_FLG_USE_DECODE_UNORM8`. This flag indicates that the image will be
|
||||
used with the `decode_unorm8` decode mode. When set during compression
|
||||
this allows the compressor to use the correct rounding when determining the
|
||||
best encoding.
|
||||
* **Feature:** Command line tool supports a new option, `-decode_unorm8`.
|
||||
This option indicates that the image will be used with the `decode_unorm8`
|
||||
decode mode. This option will automatically be set for decompression
|
||||
(`-d*`) and trial (`-t*`) tool operation if the decompressed output image
|
||||
is stored to an 8-bit per component file format. This option must be set
|
||||
manually for compression (`-c*`) tool operation, as the desired decode mode
|
||||
cannot be reliably determined.
|
||||
* **Feature:** Library configuration supports a new optional progress
|
||||
reporting callback to be specified. This is called during compression to
|
||||
to allow interactive tooling use cases to display incremental progress. The
|
||||
command line tool uses this feature to show compression progress unless
|
||||
`-silent` is used.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.6.1
|
||||
|
||||
**Status:** November 2023
|
||||
|
||||
The 4.6.1 release is a minor maintenance release to fix a scaling bug on
|
||||
large core count Windows systems.
|
||||
|
||||
* **General:**
|
||||
* **Optimization:** Windows builds of the `astcenc` command line tool can now
|
||||
use more than 64 cores on large core count systems. This change doubled
|
||||
command line performance for `-exhaustive` compression when testing on an
|
||||
96 core/192 thread system.
|
||||
* **Feature:** Windows Arm64 native builds of the `astcenc` command line tool
|
||||
are now included in the prebuilt release binaries.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.6.0
|
||||
|
||||
**Status:** November 2023
|
||||
|
||||
The 4.6.0 release retunes the compressor heuristics to give improvements to
|
||||
performance for trivial losses to image quality. It also includes some minor
|
||||
bug fixes and code quality improvements.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the updated
|
||||
`astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Fixed context allocation for contexts allocated with the
|
||||
`ASTCENC_FLG_DECOMPRESS_ONLY` flag.
|
||||
* **Bug-fix:** Reduced use of `reinterpret_cast` in the core codec to
|
||||
avoid strict aliasing violations.
|
||||
* **Optimization:** `-medium` search quality no longer tests 4 partition
|
||||
encodings for block sizes between 25 and 83 texels (inclusive). This
|
||||
improves performance for a tiny drop in image quality.
|
||||
* **Optimization:** `-thorough` and higher search qualities no longer test the
|
||||
mode0 first search for block sizes between 25 and 83 texels (inclusive).
|
||||
This improves performance for a tiny drop in image quality.
|
||||
* **Optimization:** `TUNE_MAX_PARTITIONING_CANDIDATES` reduced from 32 to 8
|
||||
to reduce the size of stack allocated data structures. This causes a tiny
|
||||
drop in image quality for the `-verythorough` and `-exhaustive` presets.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.5.0
|
||||
|
||||
**Status:** June 2023
|
||||
|
||||
The 4.5.0 release is a maintenance release with small image quality
|
||||
improvements, and a number of build system quality of life improvements.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Improved handling compiler arguments in CMake, including
|
||||
consistent use of MSVC-style command line arguments for ClangCL.
|
||||
* **Bug-fix:** Invariant Clang builds now use `-ffp-model=precise` with
|
||||
`-ffp-contract=off` which is needed to restore invariance due to recent
|
||||
changes in compiler defaults.
|
||||
* **Change:** macOS binary releases are now distributed as a single universal
|
||||
binary for all platforms.
|
||||
* **Change:** Windows binary releases are now compiled with VS2022.
|
||||
* **Change:** Invariant MSVC builds for VS2022 now use `/fp:precise` instead
|
||||
of `/fp:strict`, which is is now possible because precise no longer implies
|
||||
contraction. This should improve performance for MSVC builds.
|
||||
* **Change:** Non-invariant Clang builds now use `-ffp-model=precise` with
|
||||
`-ffp-contract=on`. This should improve performance on older Clang
|
||||
versions which defaulted to no contraction.
|
||||
* **Change:** Non-invariant MSVC builds for VS2022 now use `/fp:precise`
|
||||
with `/fp:contract`. This should improve performance for MSVC builds.
|
||||
* **Change:** CMake config variables now use an `ASTCENC_` prefix to add a
|
||||
namespace and group options when the library is used in a larger project.
|
||||
* **Change:** CMake config `ASTCENC_UNIVERSAL_BUILD` for building macOS
|
||||
universal binaries has been improved to include the `x86_64h` slice for
|
||||
AVX2 builds. Universal builds are now on by default for macOS, and always
|
||||
include NEON (arm64), SSE4.1 (x86_64), and AVX2 (x86_64h) variants.
|
||||
* **Change:** CMake config `ASTCENC_NO_INVARIANCE` has been inverted to
|
||||
remove the negated option, and is now `ASTCENC_INVARIANCE` with a default
|
||||
of `ON`. Disabling this option can substantially improve performance, but
|
||||
images can different across platforms and compilers.
|
||||
* **Optimization:** Color quantization and packing for LDR RGB and RGBA has
|
||||
been vectorized to improve performance.
|
||||
* **Change:** Color quantization for LDR RGB and RGBA endpoints will now try
|
||||
multiple quantization packing methods, and pick the one with the lowest
|
||||
endpoint encoding error. This gives a minor image quality improvement, for
|
||||
no significant performance impact when combined with the vectorization
|
||||
optimizations.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.4.0
|
||||
|
||||
**Status:** March 2023
|
||||
|
||||
The 4.4.0 release is a minor release with image quality improvements, a small
|
||||
performance boost, and a few new quality-of-life features.
|
||||
|
||||
* **General:**
|
||||
* **Change:** Core library no longer checks availability of required
|
||||
instruction set extensions, such as SSE4.1 or AVX2. Checking compatibility
|
||||
is now the responsibility of the caller. See `astcenccli_entry.cpp` for
|
||||
an example of code performing this check.
|
||||
* **Change:** Core library can be built as a shared object by setting the
|
||||
`-DSHAREDLIB=ON` CMake option, resulting in e.g. `libastcenc-avx2-shared.so`.
|
||||
Note that the command line tool is always statically linked.
|
||||
* **Change:** Decompressed 3D images will now write one output file per
|
||||
slice, if the target format is a 2D image format.
|
||||
* **Change:** Command line errors print to stderr instead of stdout.
|
||||
* **Change:** Color encoding uses new quantization tables, that now factor
|
||||
in floating-point rounding if a distance tie is found when using the
|
||||
integer quant256 value. This improves image quality for 4x4 and 5x5 block
|
||||
sizes.
|
||||
* **Optimization:** Partition selection uses a simplified line calculation
|
||||
with a faster approximation. This improves performance for all block sizes.
|
||||
* **Bug-fix:** Fixed missing symbol error in decompressor-only builds.
|
||||
* **Bug-fix:** Fixed infinity handling in debug trace JSON files.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 4.3 release:**
|
||||
|
||||

|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.3.1
|
||||
|
||||
**Status:** January 2023
|
||||
|
||||
The 4.3.1 release is a minor maintenance release. No performance or image
|
||||
quality changes are expected.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Fixed typo in `-2/3/4partitioncandidatelimit` CLI options.
|
||||
* **Bug-fix:** Fixed handling for `-3/4partitionindexlimit` CLI options.
|
||||
* **Bug-fix:** Updated to `stb_image.h` v2.28, which includes multiple fixes
|
||||
and improvements for image loading.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.3.0
|
||||
|
||||
**Status:** January 2023
|
||||
|
||||
The 4.3.0 release is an optimization release. There are minor performance
|
||||
and image quality improvements in this release.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the updated
|
||||
`astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Use lower case `windows.h` include for MinGW compatibility.
|
||||
* **Change:** The `-mask` command line option, `ASTCENC_FLG_MAP_MASK` in the
|
||||
library API, has been removed.
|
||||
* **Optimization:** Always skip blue-contraction for `QUANT_256` encodings.
|
||||
This gives a small image quality improvement for the 4x4 block size.
|
||||
* **Optimization:** Always skip RGBO vector calculation for LDR encodings.
|
||||
* **Optimization:** Defer color packing and scrambling to physical layer.
|
||||
* **Optimization:** Remove folded `decimation_info` lookup tables. This
|
||||
significantly reduces compressor memory footprint and improves context
|
||||
creation time. Impact increases with the active block size.
|
||||
* **Optimization:** Increased trial and refinement pruning by using stricter
|
||||
target errors when determining whether to skip iterations.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 4.2 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.2.0
|
||||
|
||||
**Status:** November 2022
|
||||
|
||||
The 4.2.0 release is an optimization release. There are significant performance
|
||||
improvements, minor image quality improvements, and library interface changes in
|
||||
this release.
|
||||
|
||||
Reminder - the codec library API is not designed to be binary compatible across
|
||||
versions. We always recommend rebuilding your client-side code using the updated
|
||||
`astcenc.h` header.
|
||||
|
||||
* **General:**
|
||||
* **Bug-fix:** Compression for RGB and RGBA base+offset encodings no
|
||||
longer generate endpoints with the incorrect blue-contract behavior.
|
||||
* **Bug-fix:** Lowest channel correlation calculation now correctly ignores
|
||||
constant color channels for the purposes of filtering 2 plane encodings.
|
||||
On average this improves both performance and image quality.
|
||||
* **Bug-fix:** ISA compatibility now checked in `config_init()` as well as
|
||||
in `context_alloc()`.
|
||||
* **Change:** Removed the low-weight count optimization, as more recent
|
||||
changes had significantly reduced its performance benefit. Option removed
|
||||
from both command line and configuration structure.
|
||||
* **Feature:** The `-exhaustive` mode now runs full trials on more
|
||||
partitioning candidates and block candidates. This improves image quality
|
||||
by 0.1 to 0.25 dB, but slows down compression by 3x. The `-verythorough`
|
||||
and `-thorough` modes also test more candidates.
|
||||
* **Feature:** A new preset, `-verythorough`, has been introduced to provide
|
||||
a standard performance point between `-thorough` and the re-tuned
|
||||
`-exhaustive` mode. This new mode is faster and higher quality than the
|
||||
`-exhaustive` preset in the 4.1 release.
|
||||
* **Feature:** The compressor can now independently vary the number of
|
||||
partitionings considered for error estimation for 2/3/4 partitions. This
|
||||
allows heuristics to put more effort into 2 partitions, and less in to
|
||||
3/4 partitions.
|
||||
* **Feature:** The compressor can now run trials on a variable number of
|
||||
candidate partitionings, allowing high quality modes to explore more of the
|
||||
search space at the expense of slower compression. The number of trials is
|
||||
independently configurable for 2/3/4 partition cases.
|
||||
* **Optimization:** Introduce early-out threshold for 2/3/4 partition
|
||||
searches based on the results after 1 of 2 trials. This significantly
|
||||
improves performance for `-medium` and `-thorough` searches, for a minor
|
||||
loss in image quality.
|
||||
* **Optimization:** Reduce early-out threshold for 3/4 partition searches
|
||||
based on 2/3 partition results. This significantly improves performance,
|
||||
especially for `-thorough` searches, for a minor loss in image quality.
|
||||
* **Optimization:** Use direct vector compare to create a SIMD mask instead
|
||||
of a scalar compare that is broadcast to a vector mask.
|
||||
* **Optimization:** Remove obsolete partition validity masks from the
|
||||
partition selection algorithm.
|
||||
* **Optimization:** Removed obsolete channel scaling from partition
|
||||
`avgs_and_dirs()` calculation.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 4.0 and 4.1 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.1.0
|
||||
|
||||
**Status:** August 2022
|
||||
|
||||
The 4.1.0 release is a maintenance release. There is no performance or image
|
||||
quality change in this release.
|
||||
|
||||
* **General:**
|
||||
* **Change:** Command line decompressor no longer uses the legacy
|
||||
`GL_LUMINANCE` or `GL_LUMINANCE_ALPHA` format enums when writing KTX
|
||||
output files. Luminance textures now use the `GL_RED` format and
|
||||
luminance_alpha textures now use the `GL_RG` format.
|
||||
* **Change:** Command line tool gains a new `-dimage` option to generate
|
||||
diagnostic images showing aspects of the compression encoding. The output
|
||||
file name with its extension stripped is used as the stem of the diagnostic
|
||||
image file names.
|
||||
* **Bug-fix:** Library decompressor builds for SSE no longer use masked store
|
||||
`maskmovdqu` instructions, as they can generate faults on masked lanes.
|
||||
* **Bug-fix:** Command line decompressor now correctly uses sized type enums
|
||||
for the internal format when writing output KTX files.
|
||||
* **Bug-fix:** Command line compressor now correctly loads 16 and 32-bit per
|
||||
component input KTX files.
|
||||
* **Bug-fix:** Fixed GCC9 compiler warnings on Arm aarch64.
|
||||
|
||||
<!-- ---------------------------------------------------------------------- -->
|
||||
## 4.0.0
|
||||
|
||||
**Status:** July 2022
|
||||
|
||||
The 4.0.0 release introduces some major performance enhancement, and a number
|
||||
of larger changes to the heuristics used in the codec to find a more effective
|
||||
cost:quality trade off.
|
||||
|
||||
* **General:**
|
||||
* **Change:** The `-array` option for specifying the number of image planes
|
||||
for ASTC 3D volumetric block compression been renamed to `-zdim`.
|
||||
* **Change:** The build root package directory is now `bin` instead of
|
||||
`astcenc`, allowing the CMake install step to write binaries into
|
||||
`/usr/local/bin` if the user wishes to do so.
|
||||
* **Feature:** A new `-ssw` option for specifying the shader sampling swizzle
|
||||
has been added as convenience alternative to the `-cw` option. This is
|
||||
needed to correct error weighting during compression if not all components
|
||||
are read in the shader. For example, to extract and compress two components
|
||||
from an RGBA input image, weighting the two components equally when
|
||||
sampling through .ra in the shader, use `-esw ggga -ssw ra`. In this
|
||||
example `-ssw ra` is equivalent to the alternative `-cw 1 0 0 1` encoding.
|
||||
* **Feature:** The `-a` alpha weighting option has been re-enabled in the
|
||||
backend, and now again applies alpha scaling to the RGB error metrics when
|
||||
encoding. This is based on the maximum alpha in each block, not the
|
||||
individual texel alpha values used in the earlier implementation.
|
||||
* **Feature:** The command line tool now has `-repeats <count>` for testing,
|
||||
which will iterate around compression and decompression `count` times.
|
||||
Reported performance metrics also now separate compression and
|
||||
decompression scores.
|
||||
* **Feature:** The core codec is now warning clean up to /W4 for both MSVC
|
||||
`cl.exe` and `clangcl.exe` compilers.
|
||||
* **Feature:** The core codec now supports arm64 for both MSVC `cl.exe` and
|
||||
`clangcl.exe` compilers.
|
||||
* **Feature:** `NO_INVARIANCE` builds will enable the `-ffp-contract=fast`
|
||||
option for all targets when using Clang or GCC. In addition AVX2 targets
|
||||
will also set the `-mfma` option. This reduces image quality by up to 0.2dB
|
||||
(normally much less), but improves performance by up to 5-20%.
|
||||
* **Optimization:** Angular endpoint min/max weight selection is restricted
|
||||
to weight `QUANT_11` or lower. Higher quantization levels assume default
|
||||
0-1 range, which is less accurate but much faster.
|
||||
* **Optimization:** Maximum weight quantization for later trials is selected
|
||||
based on the weight quantization of the best encoding from the 1 plane 1
|
||||
partition trial. This significantly reduces the search space for the later
|
||||
trials with more planes or partitions.
|
||||
* **Optimization:** Small data tables now use in-register SIMD permutes
|
||||
rather than gathers (AVX2) or unrolled scalar lookups (SSE/NEON). This can
|
||||
be a significant optimization for paths that are load unit limited.
|
||||
* **Optimization:** Decompressed image block writes in the decompressor now
|
||||
use a vectorized approach to writing each row of texels in the block,
|
||||
including to ability to exploit masked stores if the target supports them.
|
||||
* **Optimization:** Weight scrambling has been moved into the physical layer;
|
||||
the rest of the codec now uses linear order weights.
|
||||
* **Optimization:** Weight packing has been moved into the physical layer;
|
||||
the rest of the codec now uses unpacked weights in the 0-64 range.
|
||||
* **Optimization:** Consistently vectorize the creation of unquantized weight
|
||||
grids when they are needed.
|
||||
* **Optimization:** Remove redundant per-decimation mode copies of endpoint
|
||||
and weight structures, which were really read-only duplicates.
|
||||
* **Optimization:** Early-out the same endpoint mode color calculation if it
|
||||
cannot be applied.
|
||||
* **Optimization:** Numerous type size reductions applied to arrays to reduce
|
||||
both context working buffer size usage and stack usage.
|
||||
|
||||
### Performance:
|
||||
|
||||
Key for charts:
|
||||
|
||||
* Color = block size (see legend).
|
||||
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
||||
|
||||
**Relative performance vs 3.7 release:**
|
||||
|
||||

|
||||
|
||||
|
||||
- - -
|
||||
|
||||
_Copyright © 2022-2024, Arm Limited and contributors. All rights reserved._
|
||||
Reference in New Issue
Block a user