Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
The team will post updates and news about our project here

AMMX - Apollo 68080 MMX for AMIGApage  1 2 3 

Gunnar von Boehn
(Apollo Team Member)
Posts 6207
08 Aug 2016 10:05


AMMX

The recent Apollo 68080 CORE version include support for AMMX instruction set.

AMMX is the 68k version of the notorious MMX instruction set from INTEL.

MMX was from Intel designed to significantly accelerate Video, Graphic, and Audio en/decoding.
Our motivation for adding AMMX was to accelerate also
the very demanding and CPU intensive decoding operations needed for smooth video playback.

AMMX does implement INTEL well known MMX instruction set.
There are many optimized codecs available for MMX already - as the MMX operations are now also available on 68k - running/porting them to AMIGA is very straight forward.

If you are experience in coding MMX then you can use AMMX out of the box.
If you want to learn how to code for AMMX then I would recommend to read the excellent documentation of INTEL MMX. There is a wealth of literature about MMX coding howtos existing already.

Where does AMMX and MMX differ?
Both MMX and AMMX allow to run the SIMD instructions.
Both support the same type of operations.
Both allow to operate directly on Register or to use Memory as Input operand.
AMMX enhanced MMX in such a way that if offers
a) 3 Operand operations
b) Does not limit the programmer to 8 Register but allows him to use up to 32 Registers.

While MMX code can be compiled 1to1 to AMMX now -
the upgraded number of register will allow to avoid some register spill - and the 3 Operant operation feature will allow to remove some MMX MOVE instructions.

This means MMX code can now be run on AMIGA and because AMMX is more powerful the code can be tuned to be more efficient than as it was on INTEL.


Grzegorz Wójcik (pisklak
(Apollo Team Member)
Posts 87
08 Aug 2016 11:40


For sure AMMX add on is very nice and allow us to speedup many things.
Hope that coders wil love it and use it full potential in DTypes, movie players, audio players, demos etc :-)



Henryk Richter
(Apollo Team Member)
Posts 128/ 1
08 Aug 2016 13:08


Great news. I can't wait getting my hands on a Vampire. I do have some code ready for Amiga conversion (currently SSE, Altivec) that'll significantly benefit from these instructions (H.264/AVC decoder).

For me, the prospective timeline depends on when the A500 cards will be available to the public.

By the way, were you able to convince Toni Wilen to add support for the Apollo instruction set? I wouldn't mind to write some code in UAE.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
09 Aug 2016 09:50


Henryk Richter wrote:

I can't wait getting my hands on a Vampire. I do have some code ready for Amiga conversion (currently SSE, Altivec) that'll significantly benefit from these instructions (H.264/AVC decoder).

Doubling JPEG decoding speed is a reasonable expectation with tuned AMMX datatypes.
This would also benefit usecases like websurfing a lot.

Is clear writing optimized datatypes or optimized codecs for AMIGA would be of benefit all users.

If you have serious plans for doing this then please contact me on IRC, we can discuss how to work together to get this done as quick as possible.




Henryk Richter
(Apollo Team Member)
Posts 128/ 1
09 Aug 2016 16:49


I'm aware of the SIMD programming tradeoffs, especially in cases where the data cannot be reasonably guaranteed to match the properties of the underlying machine (sadly, often in video...).

I won't promise the moon at this point. I followed the progress of your project(s) and am reasonably confident that realtime decoding of a certain subset of publically available H.264 streams should be within grasp. As my personal reference, I demoed my decoder on a conference in 2003 with three simultaneuos streams in CIF, SDTV and 720p - on a 667 MHz Powerbook. So yeah, I seriously plan to port that one to the Amiga but the time frame depends on the available amount of spare time.

I'll keep your IRC invitation in mind.


Andrew Copland

Posts 113
09 Aug 2016 20:06


Does this require aligned load/stores?
Are there faster load/store operations for wider memory transfers to regsiter/mem etc?

Also why not SSE(/2) instead of MMX since it was for 64bit data types?

Just curious :) Also wondering if there's some documentation about it somewhere.


Andrew Copland

Posts 113
09 Aug 2016 20:43


After more thought, SSE was completely different to MMX!
Ignore that part :)


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
09 Aug 2016 21:18


Andrew Copland wrote:

  Does this require aligned load/stores?
 

  Not required.
  You can LOAD and STORE from any byte address.
  Misaligned LOADS are handled without penalty.
  Misaligned STORES also take no extra CPU cyle - but can take an extra cycle on the bus..
  So an optimal routine will read misaligned and store aligned.
 
 
 
Andrew Copland wrote:

  Are there faster load/store operations for wider memory transfers to register/mem etc?
 

  64bit per cycle @ 100 MHz max 800 MB/sec read/write
 
 
Andrew Copland wrote:

  Just curious :) Also wondering if there's some documentation about it somewhere.
 

Yes, MMX is well documented by Intel.
There should be plenty of books.

AMMX implements many operants 100% like MMX.
For byte re-ordering AMMX uses PERM which is more flexible.
Also all instructions support 3 operands and 32 registers are available.
 
If you like to write some stuff please ping me, would be nice to do something together.
 


Daniel Sevo

Posts 299
10 Aug 2016 22:05


Hello Gunnar,
Will you be able to use FPU and AMMX in parallel (once FPU is implemented)?



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
10 Aug 2016 23:14


Daniel Sevo wrote:

Hello Gunnar,
  Will you be able to use FPU and AMMX in parallel (once FPU is implemented)?
 

Why not?


Mark Smith

Posts 30
11 Aug 2016 02:29


Gunnar von Boehn wrote:

 
Daniel Sevo wrote:

  Hello Gunnar,
    Will you be able to use FPU and AMMX in parallel (once FPU is implemented)?
   
 

  Why not?
 

 
  I guess Daniel is remembering that you couldn't/can't use the FPU at the same time as using MMX (on intel).
 
  EXTERNAL LINK 


Szyk Cech

Posts 191
11 Aug 2016 10:16


Gunnar von Boehn wrote:

Daniel Sevo wrote:

  Hello Gunnar,
  Will you be able to use FPU and AMMX in parallel (once FPU is implemented)?
 
 

  Why not?

Maybe because they share the same registers?


Andrew Copland

Posts 113
11 Aug 2016 10:51


That was x86 specific though, no need to reuse those same registers in this design.


Grzegorz Wójcik (pisklak
(Apollo Team Member)
Posts 87
11 Aug 2016 10:52


Yes but we will have 32 regs not 8 ! So you code may use for example 16 for AMMX and 16 for FPU in parallel I think. Maybe I am wrong - if so then I am sure Gunnar will correct me !


Andrew Copland

Posts 113
11 Aug 2016 11:03


Gunnar von Boehn wrote:
If you like to write some stuff please ping me, would be nice to do something together.

It'd be cool but I'm busy with other projects and my day job.
I've nearly gotten my A600+V600 setup again, I finally fixed the problem I was having with it :)

No development setup yet.


Szyk Cech

Posts 191
11 Aug 2016 13:07


Any way I think CPU must be very wise to handle AMMX and FPU instructions simultaneously.
I am very curious how it can be done as FPU registers are arranged in stack (is it true at least for Intel compatible processors).
So: they can be split to smaller stacks? E.g. 4 stacks for 8 registers?


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
11 Aug 2016 13:16


Szyk Cech wrote:

  Any way I think CPU must be very wise to handle AMMX and FPU instructions simultaneously.
  I am very curious how it can be done as FPU registers are arranged in stack (is it true at least for Intel compatible processors).
  So: they can be split to smaller stacks? E.g. 4 stacks for 8 registers?
 

 

68080 includes MMX instruction set because its great for multimedia!
And also also because there is a huge wealth of existing codecs for MMX already.

We of course not recreated any artificial limitations of Intel cores. This should be clear without saying!

Now stop wasting time with this and get back on Topic!


Roman S.

Posts 149
11 Aug 2016 19:17


As far as I know, at least some multimedia instruction sets on x86 require OS kernel support - it has to take care about additional registers during context switching.

Does this apply to AMMX also? Did you have to modify the exec to support them?

(I'm just curious)


Wawa T

Posts 695
11 Aug 2016 23:06


i think for starters it might be enough to have asm inlines in critical spots of critical libraries taking advantage of the new instructions. one could of course patch the exec or introduce the support to a compatible system like aros68k, or even establish a new compiler backend, but it remains to be seen..


Krzysztof Smiechowicz

Posts 6
12 Aug 2016 19:38


I think Roman's question was about saving XMM registers state during task switch and yes, you need to have OS support for doing this. The CPU does not know that task was switched - it does not know task exists at all. AROS x86 has that support in its kernel.

posts 59page  1 2 3