Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

Idea - a New Instructions for Audio Processing

Cyprian K

Posts 26
01 Aug 2021 22:54


as 68080 has a nice graphics instructions - AMMX, maybe would be possible to add a new instructions for audio processing (DSP) like MAC, ring buffer with modulo or saturation?



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
02 Aug 2021 06:51


Cyprian K wrote:

  as 68080 has a nice graphics instructions - AMMX, maybe would be possible to add a new instructions for audio processing (DSP) like MAC, ring buffer with modulo or saturation?
 

 
Hi Cyprian,
 
Yes you are right the AMMX instruction set is excellent.
AMMX can be used for many topics:
For Video, Graphics and also for Audio.

The 68080 AMMX instructions support Average/Min/Max/Mul/ADD/SUB and also with saturations and can very good also be used for Audio processing.

As you might already expect, the 68080 CPU is much faster and much more powerful in this than the Motorola DSP chips.
 

Please note that in addition to the power of AMMX, the Super-AGA chipset has very capable Hardware Audio channels. These hardware channels can play several stereo dual 16bit streams with panning in parallel fully on their own using DMA without any CPU need.


Cyprian K

Posts 26
02 Aug 2021 22:50


that's cool.
how precise AMMX is? I know about 8bit. What about 16bit or 32bit per sample?

Is there any direct replacement for MAC instruction (A = A + B * C), with more than 16bit precision?
And what about circular buffer? Is there a similar addressing mode?

There you can find a highpass IIR filter procedure, which takes 6 CPU cycles, where MAC and three circular/ring buffers (regs: R0/R4/R5 and accordingly N0/N4/N5 and M0/M4/M5) are used.
Page 47 at http://galaxy.uci.agh.edu.pl/~rumian/LabDSP/APR7.pdf


        ;Y1=x(n) (Input)
        ;X0=α
MPY X0,Y1,A  X:(R0)+,X0  Y:(R4)+,Y0  ;A=αx(n)
MAC X0,Y0,A  X:(R0)+,X0  Y:(R4),Y0  ;A=A-2αx(n-1)
MAC X0,Y0,A  X:(R0)+,X0  Y:(R5)+,Y0  ;A=A+αx(n-2)
MAC X0,Y0,A  X:(R0)+,X0  Y:(R5),Y0  ;A=A+γy(n-1)
MAC X0,Y0,A  X:(R0)+,X0  Y1,Y:(R4)  ;A=A-βy(n-2)
MOVE    A,Y:(R5)  ;y(n)=2A (assumes scaling ;mode is set).
        ;X1 is Output


It can be easily replaced with a regular 68k instructions, but without dedicated sound instructions, it would take much more than 6 instructions and 6 cycles.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
03 Aug 2021 08:30


Cyprian K wrote:

that's cool.
how precise AMMX is? I know about 8bit. What about 16bit or 32bit per sample?

 
For in detail information and examples worth reading might also be  our online documentation. The AMMX instruction set is defined and explained in our "CODING" documentation.
Please follow the above link "CODING", then click "68080 CPU", then "AMMX".
 
Lets try to answer you:
AMMX operations are generally 64bit wide.
Some AMMX operations generate per cycle TWO 64bit vectors.
Most AMMX operations generate per cycle ONE 64bit vector.
The 64bit vector does SIMD.
This means can do many smaller operations in it in parallel.
Typically you do either 8 times Byte operations or as 4 times Word operations per instruction.
For example: BFLY (Butterfly) will per cycle do 4 WORD additions and 4 WORD subtractions.
AMMX has a MAC instruction supporting 8 Byte multiplications and 8 Byte additions per cycle. You also have Instructions doing 16bit Multiplications with 16bit results and 16bit multiplications creating 32bit results.
 
Cyprian K wrote:

What about 16bit or 32bit per sample?

AMMX does not support 32bit additions.
But also your DSP does not support this. :-)
 
 
If you want to use much more than 16bit precision one interesting option that you could consider is to use floating point.
The FPU of Apollo 68080 is fully pipelined.
 
 
This means you can do a FADD each cycle.
You can also do a FMUL or FDIV with 1 instruction per cycle throughput.
 
Using the FPU you can get much higher precision with your filter.
 
If you for example compare the FPU of the 68080 with the Coldfire then the 68080 reaches with good code over 4 times the performance of the Coldfire per clock. This means the 85Mhz V4 reaches over 340 MHz Coldfire FPU speed.
 
 
Cyprian K wrote:

There you can find a highpass IIR filter procedure, which takes 6 CPU cycles, where MAC and three circular/ring buffers

 
If my memory of the DSP5600 is correct, then it did reach about 16 Mips. The Apollo 68080 in the V4 reaches 165 Mips and 85 MFlops
The 68080 gives you much more horse power and even without using the FPU and without using AMMX but just normal simple 68K instructions the 68080 would do this faster than the DSP did.
 
At the time of the Atari, like in the ATARI Falcon, the DSP chip was a nice addition as the used 68030 CPU in the Falcon was by itself slow and also low clocked.
I think the relative slow 68030 in the Atari reached in the magnitude of 3 Mips or so, right?
 
If your main CPU reaches only 3 Mips giving it a "DSP-squire" with 16 Mips does really help you to do stuff that the CPU could alone never do.
 
Of course if your main CPU can do 165 Mips, then you not need a "DSP-squire" to help you.




David Pesce

Posts 12
03 Aug 2021 08:40


Multiply–accumulate is not a problem with 16bits precision. You can do four MAC with two instructions.
CLICK HERE 
If your filter is well designed there is no reason to use 32bits float to do the work.


posts 5