* Also use separate upload and download buffers optimized for write and readback respectively. This gives a huge 20+ FPS boost in most games which were bottlenecked by slow reads
* RGBA8 surfaces now expose an additional R32Uint view used for storage descriptors. The format is guaranteed by the spec to support atomic loads/stores. This requires the mutable flag which incurs a performance cost, but might be better than breaking the current renderpass each draw when rendering shadows, especially on mobile
* Color mask is also implemented which fixes Street Fighter and Monster Hunter Stories
* This commit includes large changes to have textures are handling. Instead of using ImageAlloc, Surface is used instead which provides multiple benefits: automatic recycling on destruction and ability to use the TextureRuntime interface to simplify operations
* Layout tracking is also implemented which allows transitioning of individual mip levels without errors
* This fixes graphical errors in multiple games which relied on framebuffer uploads
* The cache didn't take into account the framebuffer and render area used, so if these changed the renderpass wouldn't restart. This caused graphical bugs in Pokemon X/Y
* Intel iGPUs don't support blit on all depth/stencil formats which caused issues since the runtime checks for this while the renderpass cache does not
* Integrate format convertion to the morton copy function, removing the need for an intermediate copy and convertion pass. This should be beneficial for performance especially since most games use tiled textures
* Also bump vertex buffer size to avoid crashes with hardware shaders and provide correct offset on normal draws which fixes glitches in pokemon Y
* Reduce the local group size to 8 in the D24S8 compute shader which fixes graphical issues in the afformentioned pokemon games at native resolution
* Set LOD to 0 instead of 0.25 to fix another glitch in pokemon y