Andrew B. Wright, S. M. ’88, Ph. D.
The archive was built in the /root/code/talk subdirectory. Remember to change the source root in the Makefile if you use a different directory.
Place the archive in the /root/code directory and unpack with tar -xzf talk*.gz.
Change to the talk directory and compile the code with “make talk” and “make install.”
NOTE: the pru_comm.c library is much different in this example (and subsequent examples) than earlier as it implements the full set of communication codes. This archive will be updated as the code is improved.
The goal for the command interface is to make the code as similar as possible between the pru-side and the arm-side. So, there is a header file, arm_comm.h, to go along with arm_comm.c (arm-side communication functions) and pru_comm.h to go with pru_comm.c (pru-side communication functions).
An embedded processor can be used to collect data and send it through the ARM. The ARM can then send this data to the world, through bluetooth or ethernet. This blog will build on the remoteproc communication protocol to perform this streaming.
In order to have two processors talk to each other, you need a communication protocol. In “Hello, world!” the PRU echoed the exact data that it received. The host sent the data and then waited (perhaps forever) for a response. The PRU received data and then put the response onto the interface and terminated.
In this program, specific codes are going to be sent back and forth. Based on the received code, the device will undertake some action. The same state machine concept will be used on the ARM and on the PRU.
The code sequence will consist of “af” followed by a command byte (‘S’ for start, ‘P’ for stop, ‘D’ for send data, ‘s’ for start ack, ‘p’ for stop ack, ‘d’ for data). The next byte will be an unsigned character (N) telling how many bytes will follow . Then, the next N bytes will be the data.
The goal of using “human readable characters” in this protocol is to allow the “cat” and “echo” linux functions to be used to talk to the PRU. Upper case characters will be used as the command and the corresponding lower case character will be the acknowledgement. In the table, the escape sequence will be used to denote numerical values.
So, if you type echo “afS\0” the interface will send the ascii codes for ‘a’, ‘f’, ‘S’, and the numerical value 0 to the PRU. This will convey to the PRU the start command followed by 0 bytes.
For the send error command, the string would be “afE\0” and the response would be “afe\1x” where the ‘\1’ character tells the interface that 1 byte is coming and x would be that byte.
For the send data command, the string would be “afD\1\x” where \1 tells the interface that 1 byte is coming and \x would be the byte (x = 0…255).
For example, “afD\1\c” would start the transfer of 12 bytes (0xc in hexadecimal) of data and “afd\12xxxxxxxxxxxx would represent 12 bytes of data. This data could be in the format of bytes (12), integers (6), or floating point numbers (3). It’s up to the programmer to decide what the data means.
|S||s||Start the PRU|
|P||p||Stop the PRU|
|E||e\1\x||Error, where x is a one byte error code|
|D\1\N||d\Nxx…x||Send N bytes of data (xx…x)|
|F\Nxx…x||f\Nxx…x||ARM sends a N byte file (xx…x) which the PRU will echo back|
|C\5\aaaa\n||c\Nxx…x||ARM asks PRU to read \n bytes from register with 32 bit address \aaaa. PRU responds with the \n bytes starting at \aaaa.|
|M||m||transmit, not yet implemented
This feature might be useful if it is desired to hold data transmission until data is ready. Right now, D conveys how much data is desired and also initiates the data transfer.
|N||n||do nothing, not yet implemented|
|I||i||initialize, not yet implemented|
|R||r||run experiment, not yet implemented
In the original CASSY, a specific experiment would be set up and this command would run the experiment (for instance, collect T seconds of data and then stop).
|T||t||Experiment time in seconds, not yet implemented|
|X||x||Data variable index, not yet implemented
There are many control variables (for instance, error signal, accelerometer, left wheel rotation). This will set the nth transmission variable will be the xth control variable. This is useful for communicating details of a desired data stream.
afC – reading system registers
The command afC is a particularly important command. This allows the PRU to look at registers and reflect the values back to the ARM for user display. When you’re attempting to debug the bits in these registers, it is a good idea to start looking at them first. The first task in designing a new module is to look at the registers as the Debian has initialized them and to compare against the TRM’s default values. If there is an ARM-side routine that uses the device, then you can run that routine and inspect the registers after initialization. This gives another powerful set of information on how to set up the device.
The example in “talk” looks at register 0x44EC00AC, which is GPIO1’s CLKCTRL module. It asks for the 4 bytes starting at that location. The command would look like “afC\5\0xAC\0x00\0xEC\0x44\4” if you were to type it on the command line. Note that the register address has to appear as low byte first and high byte last in byte order (ie you don’t send it bit-by-bit in reversed order). There are 5 bytes sent in the command (\5) which requests 4 bytes returned (\4).
The state machine will process a buffer of data each instruction cycle until the specific code has been received, at which point it will undertake an action. As each byte is compared against the code, the state variable is updated (either to the next state or to the initial state).
insert graphic of state-machine
Errors and Limits
I did something a little different in this protocol. Normally, I would use a ringbuffer to accumulate the data. However, in this case, I allowed the vrings to function as the ringbuffer. So, I use the variable buffer_state to decide whether to read more data or not.
PRU-side: In the current rpmsg implementation, the maximum size of the vring buffer is RPMSG_BUFSIZE (512 bytes, defined in /usr/lib/ti/pru-software-support-package/include/pru_rpmsg.h). If the header + data exceeds this value, the error message PRU_RPMSG_BUF_TOO_SMALL is returned by pru_rpmsg_send.
PRU-side: In the current rpmsg implementation, the maximum number of messages that can be sent before a message is read is RPMSG_NAME_SIZE (32, defined in /usr/lib/ti/pru-software-support-package/include/pru_rpmsg.h). If this number of messages is queued up, another call to pru_rpmsg_send will return PRU_RPMSG_NO_BUF_AVAILABLE.
These error codes can be used to diagnose communication issues; however, communicating them from the PRU would require that the rpmsg system is working. In other words, telling the ARM that a PRU-side communication error has occurred requires that the communication error not occur. So, the best way to get the PRU to tell the world that it cannot talk to the world is to use digital outputs.
One way to get errors out of the system is through LEDs. There are four system LEDs; however, the system does not want to release them, and it is inconvenient to get access to them from the PRU. You have to bypass the universal cape definitions. This is bad practice.
Another way to get LEDs is to breadboard an external circuit and connect it to the digital outputs. This is how TI has done the job in its training. Maybe I’ll redo this blog in the future and include something like that. The difficulty here is that LEDs are current hogs and the digital outputs from the PRU are wimpy. It is a terrible idea to directly connect an LED to a digital output. This means wiring a transistor between 5v and ground and connecting a digital output to the base of the transistor.
For now, we’re going to use the trusty voltmeter or oscilloscope to read the digital outputs. The two pins, P8.11 (R30-15) and P8.12 (R30-14), are available for PRU digital outputs. On will be set high when PRU_RPMSG_BUF_TOO_SMALL occurs, the other will be set high when PRU_RPMSG_NO_BUF_AVAILABLE occurs. Both will be cleared when a subsequent call to the function generates RPMSG_SUCCESS.
NOTE: a future method of dealing with this would be to use one digital output and toggle it. The duration of the toggle can be indicative of the state of the PRU. This allows e.g. to start with a fast blink when the PRU is idle and to slow the blink rate down as it approaches running. Different errors can have a different blink rate. Rather than connecting an LED, a simple speaker system can be designed and the pitch of the tone indicates the error status.
There is no error checking in this process, just to simplify the code; however, the ARM could look at responses and resend commands if the appropriate response had not been received.
I would like to use a 20 msec sample rate on the beaglebone. I doubt that I need this fast a sample rate; however, it’s the rate that I used when I was involved in gas turbine engine control. If I can design a system that allows that level of performance, I can deploy it in more stringent applications.
One way to maintain a high update rate despite occasional throughput jams is to buffer the data. As long as the buffer is long enough that it doesn’t fill during the window where data is being continuously transmitted, then an update rate that is faster than the average transmission speed can be managed. There will be an initial delay, and the data presented will not be live; however, the time delay only has to be short enough that it does not cut into reaction time of a human.
To continue this example, let’s say that the beaglebone is producing data at 20 msec and that the average transmission rate is 50 msec. How long would the buffer have to be to allow 1 minute worth of data to pass without glitch?
In this case, for every 5 samples collected, 2 are transmitted. The number of samples to be transmitted are 60 seconds/.02 seconds, for a total of 3000 samples.
Draining 2 samples out of this buffer every 100 msec reduces the the needed size by 20 samples per second. Therefore, the buffer can be 3000 – 20*60 = 1800 samples. At 10 bytes per sample, this would be a buffer of about 18 kilobytes. PRU DRAM is 8 kilobytes per PRU. There is a 12 kilobyte shared buffer. This gives a total possible buffer of 28 kilobytes, which would be sufficient for this application; however, it would require some engineering work.
I want to use this with the audio system and fft. But, that is too intense for this tutorial. Probably update those blogs and refer back to this one for how to do data acquisition. Also, the sample rate for audio processing is going to be in the microseconds. So, storing minutes of data is probably out side the capabilities of the PRU.
I’m not sure at this point how fast data can be pushed through the remoteproc interface. I see nothing that would lead me to believe that it would be significantly slower than the uio interface or the edma interface. All access to the ARM will suffer from the same nondeterministic problems when the system gets busy regardless of the interface. So, I would assume that transfers will have up to 100 msec of maximum latency at the ARM’s side with a more average latency of 50 msec and build an interface that accommodates these issues.
Why does speed matter? One application of the beaglebone is to take sensor data from a robot and stream it to a host computer for live display. If you could update the display at 30 frames per second, it would not be visually noticeable. That’s a 33 msec update rate.
The TI examples (/usr/lib/ti/pru-software-support-package/examples) have many methods of sharing data between ARM, PRU0, and PRU1: PRU_PRUtoARM_Interrupt, PRU_ARMtoPRU_Interrupt, PRU_Direct_Connect0, PRU_Direct_Connect1. NOTE: PRU_PRUtoARM_Interrupt, PRU_ARMtoPRU_Interrupt appear to use the uio driver. Examples described on TI’s web page are obsolete and come from the early days of PRU development.
There is a good description of the linker command file at http://processors.wiki.ti.com/index.php/Linker_Command_File_Primer.
The PRU Optimizing C/C++ Compiler v2.1 User’s Guide describes the syntax needed to access the PRU’s shared memory.
The line in the Linker Command file
PRU_SHAREDMEM: o=0x00010000 l=0x00003000 CREGISTER = 28
provides access to the shared memory from a pointer, declared in the main.c file:
#define PRU_SRAM __far __attribute__((cregister(“PRU_SHAREDMEM”, near)))
PRU_SRAM volatile uint32_t shared_freq_1;
PRU_SRAM volatile uint32_t shared_freq_2;
PRU_SRAM volatile uint32_t shared_freq_3;
The line in the Linker Command file
DDR : org = 0x80000000 len = 0x00000100 CREGISTER=31
provides access to the ddr memory from a pointer, declared in the main.c file:
volatile far uint32_t CT_DDR __attribute__((cregister(“DDR”, near), peripheral));
The line in the Linker Command file
L3OCMC : org = 0x40000000 len = 0x00010000 CREGISTER=30
provides access to the ddr memory from a pointer, declared in the main.c file:
volatile far uint32_t CT_L3 __attribute__((cregister(“L3OCMC”, near), peripheral));
The example, PRU_Direct_Connect0 shows how to use a resource table with fw_rsc_custom_ints to map the INTC interrupt, channel set up. This is also seen in lab_4 of the TI PRU Training.
Here’s a useful intrinsic that needs further investigation:
void __xout ( unsigned int device_id , XFR unsigned int base_register , unsigned int use_remapping , void& object );
Digging through the remoteproc source code seems to indicate that the only mode and function of the resource table is to tell the remoteproc driver how to set up the INTC and the vrings. There may be some memory allocation built in, although that’s not clear.
Some thoughts regarding /dev/uio and /dev/rproc:
Getting data across the pru/arm divide uses INTC. I’m trying to see how to deal with INTC from the ARM side, but that appears to be a kernel module thing. There are two kernel modules: uio and rproc. Both of these modules use INTC. Presumably, since they may want to configure the same resources, you can only use one at a time.
The rproc wants to configure things using the resource_table. The uio wants to set things up directly in INTC registers.
Using the uio, the pruss_drv allowed you to access a pointer into PRU memory from the ARM side. The ARM could look directly at this memory. Is this wise? I cannot say. But, I’m thinking not. Also, it’s difficult to determine if ARM <-> uio <-> pru is faster or slower than ARM <->rproc<->pru.
I would like to be able to look at the INTC registers to see how everything is configured. I’m pretty sure this can be done from the PRU.
The memory map can be seen at /proc/iomem.
There was an interface to use /dev/mem and mmap. I’m pretty sure that this would need a driver (eg. uio) to talk to pru mem directly. It also appears to be a bad idea.
There is a reference to intc at /sys/devices/platform/ocp/4a300000.pruss/4a320000.intc
Another method of transfer which perhaps gets around all of this is the Enhanced Direct Memory Access (edma) module. Can the PRU set up a back-and-forth transfer between PRU and ARM. The big ticket problem in all of this seems to be getting an address in ARM (user) memory space where a write could be made.
NOTE: a useful linux command “find dir -name foo” allows you to find any file named foo under directory dir.
Some of the PRU examples clearly use remoteproc. Some appear to be tied to the uio driver. I suspect there will be a future code clean-up that either fixes or deletes the uio examples.
When I poked through the remoteproc source code much earlier, it appears that the only thing a resource table does is INTC mapping.
Use the following for the PRU’s c-code:
volatile pruCfg C4 __attribute__((cregister(“PRU_ICSS_CFG”,near),peripheral));
volatile register unsigned int __R31; //connected to PRU’s input pins and INTC controller
volatile register unsigned int __R30; //connected to PRU’s output pins
far volatile unsigned DATA_XFER_SPACE __attribute__ ((location(0x10000));
C4.SYSCFG_bit.STANDBY_INIT = 0;
DATA_XFER_SPACE = (unsigned)((&DATA_XFER_SPACE)&0xffff);
for(k=2;k<64;k++) DATA_XFER_SPACE[k] = k;
__R31 = 0×24;
“Two weeks for that?” you say. The address location 0x10000 can’t be stuffed into a 16 bit value, so the far keyword is used in defining DATA_XFER_SPACE. The number of unsigned’s in the buffer was chosen as 64, so that the length of DATA_XFER_SPACE is 0x100 bytes. Nice round number in hex.
The location attribute puts DATA_XFER_SPACE right at 0x10000 which is the start of PRU_SHAREDMEM in the linker command file.
This address will be used by the host side code. If the pru side address and the host side address don’t match, then one will be reading/writing to the wrong block of memory.
This could be accomplished indirectly in the linker command file by assigning a SECTION to the desired memory location (e.g. DATA_XFER: > PRU_SHAREDMEM) and then allocating a variable that is exactly the size of the section. That may lead to potential hinkiness if you accidentally want to put something else in that SECTION. It also requires the use of the #pragma DATA_SECTION(DATA_XFER_SPACE,DATA_XFER) in the pru c-code.
The ARM program is
static volatile unsigned int *pSRAM;
int main (void)
unsigned int ret;
tpruss_intc_initdata pruss_intc_initdata = PRUSS_INTC_INITDATA;
if (ret = prussdrv_open(PRU_EVTOUT_1)) return (ret);
if( ret=prussdrv_load_datafile(PRU0, “./data.bin”)) return (ret);
if(ret=prussdrv_exec_program (PRU0, “./text.bin”)) return ret;
prussdrv_map_prumem(PRUSS0_SHARED_DATARAM, &p); // get pointer to PRU_SHAREDMEM
pSRAM = (volatile unsigned int *)p;
printf(“The physical memory address is %x\n”,prussdrv_get_phys_addr(p)); //printf the physical address for kicks
printf(“%d, %x\n”,k,*(pSRAM+k)); // read and print the word
prussdrv_pru_clear_event (PRU_EVTOUT_1, PRU0_ARM_INTERRUPT);