2017-11-09 17:40:57 +01:00
+++
date = "2017-11-09T09:46:00-04:00"
title = "Citra Progress Report - 2017 September"
tags = [ "progress-report" ]
author = "anodium"
forum = 5005
+++
Winter arrives once more, and like I mentioned in [August's progress report ](/entry/citra-progress-report-2017-august ),
I am extremely excited for what's in store. In fact, many of the really big goodies
I've decided to seperate to their own articles, which should be coming up in the
next few weeks.
There's also been many changes this month that improve the speed of emulation across
the board, on top of the usual improvements in accuracy and features. And because
of that, I've dubbed this month #Speedtember . Let's dive right in.
2019-08-17 22:50:53 +02:00
{{% alert info %}}
2017-11-09 17:40:57 +01:00
Hello everyone! We're all terribly sorry for the delay in getting this progress
report out the door, but our main technical writer [anodium ](https://github.com/anodium ),
was just a bit busy surviving both Hurricane Irma and Maria. Although she's a
trooper and claims it's not an excuse for the delay, we find that her personal
safety is a tad more important. We're all glad that she's safe and sound, and in
a state where she can keep pumping out quality articles for Citra!
2019-08-17 22:50:53 +02:00
{{% /alert %}}
2017-11-09 17:40:57 +01:00
## [Switchable Page Tables](https://github.com/citra-emu/citra/pull/2952) by [MerryMage](https://github.com/MerryMage)
Citra has a component called [dynarmic ](https://github.com/MerryMage/dynarmic ),
which recompiles ARM11 code to x86-64 code at run time, and then executes that
generated code, rather than interpreting the ARM11 instructions directly.
Because the 3DS has a 32 bit address bus, it can address 2^32 unique memory locations.
And because the 3DS can address data down to a byte, it can address up to 2^32
unique bytes, or about 4 gigabytes of memory. When considering that no 3DS has ever
been released with more than 256 *mega*bytes of memory, this sounds absurd! And
it is... unless you consider that a 3DS uses chunks of that huge address space to
address peripherals, among other things. This is called memory-mapped input/output
(MMIO), and is a great use of millions of addresses that would otherwise have
been ignored, plus it also allows handling IO the exact same way memory is handled,
so the design can be a bit simpler as it doesn't need special circuitry to handle IO.
Herein lies our problem. Because that code is now being run on a PC, those MMIO
devices don't actually exist anymore, so Citra needs to handle those reads and
writes itself. There's a few ways to go about it, but the simplest and most naï ve
is to replace every memory read or write with a function that checks if that address
is mapped to memory or IO. Unfortunately, this is extremely slow, and we can't
afford to have extremely slow address translation when games can access memory
upwards of a few hundred thousand times per second.
With this, [MerryMage ](https://github.com/MerryMage ) has changed this behaviour so
that rather than replacing a read/write with a function, it instead translates the
address using a page table, and then tries to access that address directly. On the
page table, all addresses that map to memory simply have a memory address written down.
But on addresses that map to IO, it has address 0 written down. Trying to read or
write to memory address 0 on x86 is illegal for every process except the
operating system... and Citra tries to do it anyways!
When an invalid memory address (or a memory address that that process doesn't have
permission to access) is read from or written to, x86 CPUs throw a page fault exception.
Citra takes advantage of this behaviour by also registering an exception handler
for page faults. If a page fault is thrown, Citra knows the game tried to access IO,
and thus recompiles the memory read/write to a direct call to Citra's IO functions.
This makes the usual case (memory access) extremely fast, and the less usual case
slow, but only the first time it happens. Subsequent IO accesses use the recompiled
functions which are faster.
This technique is called fastmem, and is not new at all. In fact, Dolphin uses
it extensively in its JIT recompiler to speed up memory access as well. And thanks
to [MerryMage ](https://github.com/MerryMage )'s hard work, this same technique is
now used extensively by Citra.
## [Give each process its own page table](https://github.com/citra-emu/citra/pull/2842) by [Subv](https://github.com/Subv)
In order to support running multiple processes at the same time, like your computer,
Citra implements virtual memory, in which each process has its own page table.
The page table represents a translation from the process' virtual addresses, to
the 3DS' physical (or "real") addresses.
Before this, because Citra did not support multiple page tables, it also didn't
support running multiple processes at once, such as a game and the software keyboard
applet. Now, thanks to [Subv ](https://github.com/Subv ), Citra has an important
building block in place.
## [Add support for loading application updates](https://github.com/citra-emu/citra/pull/2927) by [shinyquagsire23](https://github.com/shinyquagsire23)
Nintendo 3DS titles are contained within `*.app` files on the SD card or on the
game cartridge, in the [NCCH container format ](https://www.3dbrew.org/wiki/NCCH ).
This format is further divided into two formats, CXI and CFA, which stand for
__C__TR e__X__ecutable __I__mage and __C__TR __F__ile __A__rchive, respectively.
CXIs contain executable code, whereas CFAs cannot. CFAs usually accompany a CXI
to provide other features such as the digital instruction manual, the Download Play
child application, or in the case of game cartridges, system updates.
Both types of NCCH start with a header, and then followed by either an ExeFS
image, a RomFS image, or both. The entire structure of an NCCH header may be best
explained by a diagram:
{{< figure src = "/images/entry/citra-progress-report-2017-september/ncch.png"
title="Solid lines are required sections, dashed lines cannot be used in some cases, and dotted lines are optional sections." >}}
Now, games and applications need updates from time to time, and 3DSes handle these
by installing the update as a seperate title from the base game. From that point
on, whenever the user tries to launch the game, instead of loading the
extended header (or [ExHeader ](https://www.3dbrew.org/wiki/ExHeader ) for short)
and ExeFS image from the base game's NCCH, it replaces them with the update's
ExHeader and ExeFS on launch. As for RomFS, the 3DS System Software will actually
load both the base game's and the update's RomFS image, rather than replacing one
with the other. Games are left to their own devices on how to handle these, and
so the methods used per game can vary, though they usually just replace changed
files, picking files from the base game RomFS if they haven't been modified.
Citra, before this PR, had the code for loading games and reading NCCH files all
mixed into one big piece that fit in with everything else. With this patch,
[shinyquagsire23 ](https://github.com/shinyquagsire23 ) has seperated the loader
from the NCCH reader, allowing the loader to read multiple NCCHs at once. Additionally,
whenever a game is loaded, the loader would also check if there is an update title
installed on Citra's [virtual SD card ](/wiki/user-directory/ ). If there is, it
would replace the update ExHeader and ExeFS, and load the update RomFS as well.
Just like a real console!
Most games worked out of the box with updates, and because they wrote the code
with accuracy in mind, this very same PR has also laid part of the foundation
needed to handle other features such as DLC support or even using real 3DS SD cards!
Though, do note that we don't have any estimates on either those or any other
features, as no one is actively working on either.
## [Implement geometry shader](https://github.com/citra-emu/citra/pull/2865) by [wwylele](https://github.com/wwylele)
The PICA200 GPU has a pipeline similar to [OpenGL's pipeline ](https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview )
for rendering 3D objects into a 2D display. I won't go through them all here,
only the optional geometry shader step. Just after the vertex shader step, if
enabled, all the vertices are processed by a shader kernel (which is a small
program that runs directly on a GPU), taking as many vertices as the kernel
wants as input, and outputting as many vertices as the kernel wants.
Because the kernel in the geometry shader is allowed as many inputs and outputs
as it wants, it is significantly more powerful and flexible than the vertex shader,
whose kernel is restricted to only one vertex at a time, both for input and output.
But for that same reason, geometry shaders are much more complex to program, and
so many games simply disable it. The games that do not disable it though, tend to
use it very extensively, to the point of completely breaking graphics if it's not
implemented.
Multiple uses have been found in the wild for geometry shaders, including but absolutely
not limited to:
- Taking one vertex as input, and outputting a rectangle of vertices which can
be textured with a sprite. Poké mon uses this extensively to render particles
whenever a move is used. Monster Hunter takes it a step further and renders
*all of its HUD and GUI* with this kernel.
- Taking a handful of vertices as input, and outputting even more vertices which
are interpolations between the inputs, thus making the resulting mesh look smoother
and less jagged when rendered.
At first glance, geometry shaders looked like an easy problem, since they use the
same instruction set and format as vertex shaders, so a lot of the same code could
be reused. At *second* glance, it turned out that configuring inputs and outputs
for geometry shaders is much more complex than it is for vertex shaders.
There were actually three attempts to implement geometry shaders in Citra. The first
was written by [ds84182 ](https://github.com/ds84182 ) about two years ago, only to
be abandoned due to not knowing how the configuration of them was done. The second
attempt was written by [JayFoxRox ](https://github.com/JayFoxRox ), but was also
abandoned for the same reason.
But, after extensive research on geometry shaders was made by [fincs ](https://github.com/fincs ),
the API was implemented in [ctrulib ](https://github.com/smealum/ctrulib ) and
[citro3d ](https://github.com/fincs/citro3d ), and examples were written to demonstrate
how to use it. Now that the community knew exactly how they worked, [wwylele ](https://github.com/wwylele )
picked up where [JayFoxRox ](https://github.com/JayFoxRox ) left off, cleaned up
the code he wrote, and added the missing pieces.
After almost three years, and three different attempts to make it work, Citra now
has a full, complete, and correct implementation of geometry shaders!
## [Implement custom clip plane](https://github.com/citra-emu/citra/pull/2900) by [wwylele](https://github.com/wwylele)
After the geometry shader (or the vertex shader, if it wasn't enabled), the vertices
are "assembled" into a collection of triangles. After *that* , to make rendering
more efficient, the triangles are then compared to 6 planes that make up the cube
in which objects are actually visible by the camera. Any triangles outside of that
cube are deleted, and any triangles that are partially inside the cube are split
by the sides of the cube, and the resulting triangle outside of the cube is also
deleted.
But the 3DS allows games to add a 7th plane whose position is fully customizable.
Although no games are known to use this feature right now, it is indeed a feature
of the 3DS' GPU. Because implementing it was fairly straightforward,
[wwylele ](https://github.com/wwylele ) decided to just go ahead and implement it,
in case someone decided to use it in the future.
## [Optimized Morton](https://github.com/citra-emu/citra/pull/2951) by [huwpascoe](https://github.com/huwpascoe)
Morton code is a function that interleaves multi-dimensional numbers into a one-dimensional
number. Although it may seem like a very esoteric function, it's actually extremely
useful in fields like linear algebra, databases, and what the 3DS uses it for:
texture mapping.
Computers have an intermediate chunk of memory between RAM and the CPU called a
cache. Caches are seperated into lines, each of which can hold one data item. GPUs
also have a cache, also seperated into lines. Because they are seperated like this,
if a texture is loaded into the cache, it would have to span multiple cache lines,
or even not fit into the cache completely, thus making transformations on it slow,
as it would have to load and store pieces of it from RAM multiple times.
To avoid this, GPUs can Morton encode textures so that two-dimensional manipulations
are more likely to only need data already in the cache. Textures that have been
Morton coded are usually referred to as swizzled or twiddled textures.
{{< figure src = "/images/entry/citra-progress-report-2017-september/morton-koopa.png#floatright"
title="Not this Morton!" >}}
In the function that Morton is implemented, there was a lookup table on Morton
codes in the comments, and [huwpascoe ](https://github.com/huwpascoe ) thought it'd
be best if we just use the lookup table directly. It worked just as well as before,
but required less than a third of the math. Because this function is called so
often during emulation (a rough estimate from them is about "millions of times a
second"), this change although small, made very big changes in CPU performance.
## [Add draw for immediate and batch modes](https://github.com/citra-emu/citra/pull/2921) by [jroweboy](https://github.com/jroweboy)
The 3DS' GPU has two main modes for drawing to the screen, immediate and batch
mode. In the former, the GPU takes and immediately draws every vertex as it is
handed to it. In the latter, the GPU accepts vertices given to it, but doesn't
actually bother drawing them until absolutely necessary, saving a bit of time
from not having to go through the drawing procedure for every individual vertex.
Although most games don't use immediate mode at all due to it being extremely
slow, a handful do use it for a handful of visual effects, like New Super Mario
Bros. 2.
About a year ago when the GPU code on Citra was rewritten, a handful of calls to
the drawing routine were removed, as it was believed they were unnecessary. Turns
out, one of the calls was actually needed for some effects in games, as it handled
immediate mode drawing. This wasn't noticed for a very long time, as most games
appeared to carry on with no side-effects at all from the rewrite, but was eventually
found after some research courtesy of [ds84182 ](https://github.com/ds84182 ).
## [Interpolate audio samples on a frame-by-frame basis](https://github.com/citra-emu/citra/pull/2858) by [MerryMage](https://github.com/MerryMage)
When a 3DS game needs some sort of audio processing, they can access the 3DS' DSP,
or __D__igital __S__ound __P__rocessor. It's another processor, alongside the ARM9
and ARM11, that is given a firmware to run, which in turn is given a bunch of audio
samples and parameters by the game. The DSP then plays back the buffer in chunks
of about 5 milliseconds. Each one of these chunks is called an audio frame.
As of today, we don't know how the DSP exactly works, and we don't know how any
of the firmwares exactly work. (Did I forget to mention earlier there's multiple
versions of the firmware?) But we do know how to use it, and from there we can
reimplement its behaviour directly in Citra. Which is exactly what [MerryMage ](https://github.com/MerryMage )
did back in June of 2016, which in turn brought [audio support for the first time ](/entry/hle-audio-comes-to-citra/ )
in Citra.
This approach, although having the advantages of being easier to implement, easier
to understand in code, and has a higher potential of being faster, it has the
disadvantage that accuracy suffers significantly, especially when shortcuts are
taken for the sake of speed. One of these shortcuts was in the audio interpolation,
which is a way of inferring more audio samples from relatively very few existing
samples.
On a real 3DS, games are allowed to interpolate different audio frames with
different functions, even when in they're in the same buffer. On the other hand,
Citra interpolated the entire buffer with one function as soon as it was loaded.
This led to various effects and music in games to sound strange or inaccurate in
some way.
One example of this is Deku Link's footsteps in *The Legend of Zelda: Majora's Mask 3D* .
Here's the output of a real 3DS console, for reference:
{{< audio src = "/images/entry/citra-progress-report-2017-september/deku-hardware.ogg" > }}
And here's the output of Citra, before this was fixed:
{{< audio src = "/images/entry/citra-progress-report-2017-september/deku-pre2858.ogg" > }}
Now that it's been fixed, his footsteps sound a lot better:
{{< audio src = "/images/entry/citra-progress-report-2017-september/deku-post2858.ogg" > }}
Audio emulation in Citra is still somewhat inaccurate for now, though
[MerryMage ](https://github.com/MerryMage ) is gradually working on fixing and
improving it. Perhaps some day we may even be able to emulate the DSP firmware
directly, which will be much more accurate than merely emulating its behaviour.
## [Use deque instead of vector for the audio buffer](https://github.com/citra-emu/citra/pull/2958) by [Subv](https://github.com/Subv)
Whenever the DSP consumes some frames from the audio buffer, Citra deletes them
from it. This normally wouldn't pose any problems, but because the buffer was
being stored as a vector, this led to some uneccessary operations. Namely, the
C++ standard requires that all the data of a standard vector be in one contiguous
block of memory. Because deleting frames from the buffer breaks this rule, Citra
would automatically (1) allocate a new block of memory, (2) copy the entire buffer
into that new block of memory, and (3) deallocate the old block of memory, thus
deleting the old buffer.
These steps are huge waste of time, as Citra doesn't need to guarantee that the
audio buffer is in one contiguous block. So [Subv ](https://github.com/Subv ) changed
the type of the buffer from a vector to a deque, which is essentially a queue that
you can remove data from both the beginning and end of it. Because the contiguity
requirement doesn't exist in deques, Citra doesn't do the uneccessary copying,
leading to huge speed boosts in audio bound titles like Super Mario 3D Land, and
even the Home Menu. Now, both run significantly faster!
## [Add mingw64 compile support to appveyor](https://github.com/citra-emu/citra/pull/2912) by [jroweboy](https://github.com/jroweboy)
When a program is written in a high-level programming language, such as C++, Rust,
or Go, before the program can be run on a machine, it must be translated or "compiled"
to machine code. Although it is possible to do this translation by hand, it is
usually extremely difficult to do so and very time consuming. So instead, we have
a program called a compiler than can automatically do this translation for us.
This is also why a program compiled for an ARM machine cannot be run directly on
an x86 machine, even when the source can work on either machine without issues.
Instead this program must be translated, interpreted, or recompiled from source
to x86. (In fact, this translation is exactly what [dynarmic ](https://github.com/MerryMage/dynarmic )
does to run code from a 3DS.)
Every statement in a program must have an exact, unambiguous definition of what
it does (its semantics). But, in the same way that a statement that means one
thing can be written many different ways, and different compilers can translate the
same statement many different ways.
On Windows, there's two popular C++ compilers available as of today: MSVC++, which
is the compiler Microsoft has written for Windows, and MINGW GCC, which is actually
a port of the Linux `gcc` compiler to Windows. For better or worse, MINGW GCC
optimizes Citra a little better than MSVC++, and so [jroweboy ](https://github.com/jroweboy )
has changed the Citra AppVeyor build script to add support for MINGW GCC as well
as MSVC++. Do note that the MSVC++ builds are only available through GitHub, since
they're only useful for debugging, and MINGW GCC builds are faster in most, if
not all, cases, which is why the installer will only install those. This change
also has closed the gap in performance the new Nightly builds had compared to
the old Bleeding Edge builds.
## [Load different shared font depending on the region](https://github.com/citra-emu/citra/pull/2915) by [wwylele](https://github.com/wwylele)
Remember that last month [wwylele ](https://github.com/wwylele ) changed Citra so
that instead of loading the shared font from a seperate file, it would
[load it from the system archive ](https://citra-emu.org/entry/citra-progress-report-2017-august/ )?
This builds on top of that behaviour. You see, a 3DS doesn't have a shared font,
it has *four* . One contains glyphs for Latin script (for English, Spanish, Italian,
French, etc.) and Japanese scripts, another contains glyphs for Traditional Chinese,
the third font contains those for Simplified Chinese, and the last font contains
the ones for Korean.
Before this PR, Citra would simply load the first shared font regardless of game
or region. This made non-Latin or non-Japanese script games display completely
incorrect characters at best, or crash at worst. Now Citra will load the appropriate
shared font from the system archive depending on the region selected, just like
a real console! Though, this will not work on machines that only have the
`shared_font.bin` file, because it only contains the shared font for the region
of the console it was dumped from. (e.g.: If you dump a Korean console, it'll
only contain the Korean font.) If you want to use this feature, you must dump
the system archive using the latest version of [`3dsutils` ](https://github.com/citra-emu/3dsutils ).
## Et. al.
And of course, big thanks to [everyone who's contributed ](https://github.com/citra-emu/citra/graphs/contributors?from=2017-08-31&to=2017-09-30&type=c )
this September, because Citra as a whole would not be the same without everyone
involved having placed their pieces, big or small.