I had known and was warned that once I start working on this, there will not be a definitive end. I guess after some time you need to put a stop, just so that you can move on to the next phase. This post will mark the end of X.Org EVOC program and begin my journey as a Nouveau contributor.
The second phase of the program has been a rather complex one and filled with unexpected hurdles.
Many changes had to be introduced to command submission algorithm, that we had thought was fit to be implemented. The new implementation after testing proved to work almost completely bug-free.
Probably the most glaring difference would be the omission of 'memcpy' and 'wrap_around' functions.
The 'memcpy' performed a simple a task of copying a fixed length of data from a given location to another specified destination. It took three arguments, namely source, destination and length. This simple function however had a very basic problem. It did not account for the wrapping around ring buffer and hence was not compatible. To fix this problem, a new totally new function has been introduced named as 'memcpy_ring'. The 'memcpy_ring' takes five arguments. Three of the arguments are the same as 'memcpy' namely, destination, source and length. However, two new arguments which have been introduced are ring_base and ring_size they account for wrapping around ring buffer. ring_base specifies the starting memory location on the ring buffer and ring_size specifies the total number of memory locations on the ring.
The 'Wrap_around' function is more or less the same and performs the same task as before. The main difference which has been introduced it that has now been implemented as a macro rather than a function.
Probably the most glaring difference would be the omission of 'memcpy' and 'wrap_around' functions.
The 'memcpy' performed a simple a task of copying a fixed length of data from a given location to another specified destination. It took three arguments, namely source, destination and length. This simple function however had a very basic problem. It did not account for the wrapping around ring buffer and hence was not compatible. To fix this problem, a new totally new function has been introduced named as 'memcpy_ring'. The 'memcpy_ring' takes five arguments. Three of the arguments are the same as 'memcpy' namely, destination, source and length. However, two new arguments which have been introduced are ring_base and ring_size they account for wrapping around ring buffer. ring_base specifies the starting memory location on the ring buffer and ring_size specifies the total number of memory locations on the ring.
The 'Wrap_around' function is more or less the same and performs the same task as before. The main difference which has been introduced it that has now been implemented as a macro rather than a function.
The next step in the process was to design a brand new ISA. The ISA meant to serve a primary purpose of being able to successfully execute scripts on pdaemon. These scripts would be of a nature that would provide an easy way to achieve memory re-clocking.
As the first step, the existing HWSQ was studied carefully and an encode_decode implementation was done in C. This not only helped achieving a similar implementation for FSE [Fermi Scripting Engine] at a faster pace but also made understanding HWSQ
easier.
To design the ISA [FSE] the following functions were targeted :
- Delay
- MMIO Write
- MMIO Mask
- MMIO Wait
- Send_msg / Pdaemon -> Host
Even though I tried to design the ISA myself and spent a lot of time troubling myself to be able to do it, I was unable to. I believe I was not up to the challenge of doing so. Luckily enough mupuf had foreseen this and already had a basic layout. He took out sometime and we had a complete version of the ISA. The ISA reads as follows:
https://github.com/Supreetpal/evoc-scratch/blob/master/FSE.txtThe implementation was a three step process or basically producing three files:
- FSE.h
- FSE_encode_decode.c
- FSE.fuc
This final FµC implementation would then be merged in to pdaemon.The first two were similar work to that done in HWSQ. These two files should be referred to understand the working of FSE and can be found as follows. After the completion of FSE.h and FSE_encode_decode.c , the logical implementation was in place and working. This left FµC porting of the code as the next major step which would finally be implemented as a part of Pdaemon.
FSE.h -
FSE_encode_decode.c -
The FµC implementation seemed rather straightforward at first but after the initial commit and testing, I ran in to errors. It came forward that I was trying to access unaligned memory locations. Realizing that the existing load 'ld' command was not sufficient, a new set of functions were implemented. A group of three ld_XX functions were implemented in FµC. The ld_32, ld_16 and ld_08 for 32bit, 16bit and 8bit loads respectively:
The current implementation of FSE in FµC looks as follows. This implementation is still under testing and has not yet proved to be in a completely working condition. I should be able to gather enough time and submit a final patch to PDAEMON with a successful implementation while at XDC or as soon as I return.
Comments
Post a Comment