mirror of
https://github.com/tuskyapp/Tusky
synced 2025-02-09 20:20:43 +01:00
This pull request aims to dramatically improve the performance of `BlurHashDecoder` while also reducing its memory allocations. - Precompute cosines tables before composing the image so each cosine value is only computed once. - Compute cosines tables once if both are identical (for square images with the same number of colors in both dimensions). - Store colors in a one-dimension array instead of a two-dimension array to reduce memory allocations. - Use a simple `String.indexOf()` to find the index of a Base83 char, which is both faster and needs less memory than a `HashMap` thanks to better locality and no boxing of chars. - No cache is used, so computations may be performed in parallel on background threads without the need for synchronization which limits throughput. ## Benchmarks Simple: 4x4 colors, 32x32 pixels output. (This is what Mastodon and Tusky currently use) Complex: 9x9 colors, 256x256 pixels output. **Pixel 7 (Android 14)** ``` 365 738 ns 23 allocs Trace BlurHashDecoderBenchmark.tuskySimple 109 577 ns 8 allocs Trace BlurHashDecoderBenchmark.newSimple 108 771 647 ns 88 allocs Trace BlurHashDecoderBenchmark.tuskyComplex 12 932 076 ns 8 allocs Trace BlurHashDecoderBenchmark.newComplex ``` **Nexus 5 (Android 6)** ``` 4 600 937 ns 22 allocs Trace BlurHashDecoderBenchmark.tuskySimple 1 391 487 ns 7 allocs Trace BlurHashDecoderBenchmark.newSimple 1 260 644 948 ns 87 allocs Trace BlurHashDecoderBenchmark.tuskyComplex 125 274 063 ns 7 allocs Trace BlurHashDecoderBenchmark.newComplex ``` Conclusion: The new implementation is **3 times faster** than the old one for the current usage and up to **9 times faster** if we decide to increase the BlurHash quality in the future. The source code of the benchmark comparing the original untouched Kotlin implementation to the new one can be found [here](https://github.com/cbeyls/BlurHashAndroidBenchmark).