Home > Root > Log book > Meet the main CPU

Meet the main CPU

Saturday 4 February 2023, by Mathieu Brèthes

All the versions of this article: [Deutsch] [English] [français]

The Neo Geo Pocket uses a Toshiba processor, the TLCS900H, as main processor. For the sound, it uses a Z80 (the same processor as the Game Boy), but that’s not the question of this article.

So I decided to make a program in assembler for this console, in order to familiarize myself with the processor architecture, with the ultimate goal to port a C compiler to this architecture. To do this, I have to think of my assembler program as if it were a C program. It would be easy to tinker, but if I define now with rigor how, for example, the functions call each other, I will then be able to see more easily if my methodology works for when I want to implement it as a compiler.

In assembler, we work with what the processor architecture provides. In our case, the TLCS900H is quite original in that it has "windows" of registers, i.e. it has 4 windows of 4 32-bit registers, plus 4 permanently accessible registers (one of which is used specifically for the stack pointer). In Toshiba’s documentation they call them banks, but I’ll stick to the way GCC calls this type of architecture (for the future!). Assembler instructions can only run on one window at a time (not totally true, but I’m simplifying), plus on the 4 permanent registers. And the Neo Geo Pocket documentation indicates that window number 3 (they are numbered from 0 to 3...) is reserved for system functions.

So we have 3 windows available, and it is tempting to try to exploit them to optimize the code, and in particular the function calls. In the Neo Geo Pocket documentation, they suggest using the windows according to needs - for example, normal code has one, interrupts use another, etc. But one could also consider using these windows to limit the amount of data to be passed between the processor and the working memory.

Explanation. Let’s imagine a processor which would have 2 registers A and B. All the calculation made by the processor uses these 2 registers. When a function is called, the content of the 2 registers must be saved somewhere, since the called function also needs these 2 registers to make its calculations! The original values will then be restored when the function is finished and the execution returns to the caller. Usually, we use what is called a stack to store the values. But the transfer of information between the processor and the memory is very slow.

And this is where our windows can be interesting. If the main function uses window 0, it could call a sub-function, and when called, we switch to window 1. This means the registers of window 0 do not need to be saved to the main memory! And if the sub-function itself calls a sub-sub-function, we switch to window 2.

You can quickly see the problem: what happens if you have 3 levels of sub-functions? Or 4, or 5... ?

We can imagine three types of answers:

  1. oh well, it’s a pain in the ass with these windows, let’s ignore them and just use window 0 or maybe use the other windows for special purposes
  2. when we are on window 2, the following function calls switch to a "stack" mode like on a standard processor
  3. when we are on window 2, the next function call saves the registers of window 0; and this one will be restored only when you come back to the function that required them.

Each answer has its advantage and disadvantage:

  1. it is the simplest, but it is not optimal
  2. it is the most elegant, but it is not optimal
  3. it seems the most optimal, but it requires to store the values somewhere else than in a stack (indeed we don’t know when we’re going to go back down, the sub-functions can also store values in the stack, etc.)

I want to implement solution 2, but also to find a way to make it optimal. What is the problem with solution 2? The problem is that it is the functions deepest in the tree that call the stack mode, i.e. the functions that are the smallest and are called most often. We want those functions to use the optimized mode, and use stack mode only for functions that get called less - since the cost depends on how many times a function is called.

A solution 2b would be that the "stack" mode is used for function calls up to level N-3, and that the last 3 levels of function calls use the "window" mode.

This solution imposes a constraint that perhaps the compiler will be able to handle: one must have evaluated the whole tree of functions before deciding which mode to use for a function call. A "leaf" function (which does not call any other function) will always be called with the "window" mode, ditto for a function which just calls leaf functions, and ditto for functions which just call leaf functions as well as leaf functions.

Of course, what about the case where functions call themselves? Recursive functions? One can rarely determine in advance how many nested calls there will be in the case where a function calls itself. In this case, the compiler, I suppose, must be able to detect that there is a loop of calls (and the call tree is then no longer a tree but a graph with a cycle). And the solution is to say that for this case, the "stack" mode will be used for all calls.

In my assembly program, I will write different types of macros to standardize the function calls and make it "look" like the above.

For the rest, the 3 permanently accessible registers will be used to pass the first 3 parameters to the called function, the other parameters will go on the stack. And the called function will put its returns in these same three registers before giving the hand back to the caller. It is the caller who will be in charge of saving (possibly) the registers before the call (case of the "stack" mode) and restoring them afterwards, plus cleaning the return stack if necessary. The called party will be in charge of cleaning the input stack (the parameters if > 3) before handing over to the caller.

Translated with www.DeepL.com/Translator (free version)