Overview Features Instructions Performance Forum Downloads Products OrderV4 Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
VISIT APOLLO IRC CHANNEL



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
The team will post updates and news here

AMMX - Apollo 68080 MMX for AMIGApage  1 2 3 

Gunnar von Boehn
(Apollo Team Member)
Posts 4443
12 Aug 2016 19:47


Krzysztof Smiechowicz wrote:

I think Roman's question was about saving XMM registers state during task switch and yes, you need to have OS support for doing this. The CPU does not know that task was switched - it does not know task exists at all. AROS x86 has that support in its kernel.

AMIGA OS will also save the first 8 AMMX for you.
So as long your code limits itself to them - you can even use an old unpatched Kickstart.

Of course even using all 32 register will work - without unpatched Kick - as long as only 1 application like e.g. a Video player does this...



Krzysztof Smiechowicz

Posts 6
12 Aug 2016 19:52


Gunnar von Boehn wrote:

Krzysztof Smiechowicz wrote:

  I think Roman's question was about saving XMM registers state during task switch and yes, you need to have OS support for doing this. The CPU does not know that task was switched - it does not know task exists at all. AROS x86 has that support in its kernel.
 

 
  AMIGA OS will also save the first 8 AMMX for you.
  So as long your code limits itself to them - you can even use an old unpatched Kickstart.
 
  Of course even using all 32 register will work - without unpatched Kick - as long as only 1 application like e.g. a Video player does this...
 

Yes, indeed. I was not precise. There risk is however in running more than one application using 32 registers.


Krzysztof Smiechowicz

Posts 6
12 Aug 2016 19:53


By the way, it is really great development. AROS has SSE optiomized memory copy functions - possibly the idea can be usefull here as well.


Marcus Gerards

Posts 57
13 Aug 2016 12:20


How about putting the new MMX instructions into something like ammx.library?

Would circumvent the compiler support problem, allow for Apollo/classic-CPU/-mixed binaries and the speed loss because of a few JSRs/JMPs should be negligible.


Gunnar von Boehn
(Apollo Team Member)
Posts 4443
13 Aug 2016 12:52


Marcus Gerards wrote:

  Would circumvent the compiler support problem,
 

 
Lets be honest, 90% of Intel MMX code is handwritten - and never relies on compiler support.
 
The same is true for ALTIVEC on PowerPC - 90% is handwritten in ASM. 
This shows that compiler high level support is not important.
 
MMX is very useful for work loops doing a lot of computation.
For example a loop mixing audio channels would be much faster than normal 68k instructions, and only need 3 instructions in the loop.


Wawa T

Posts 690
13 Aug 2016 13:03


Krzysztof Smiechowicz wrote:

Gunnar von Boehn wrote:

 
Krzysztof Smiechowicz wrote:

  I think Roman's question was about saving XMM registers state during task switch and yes, you need to have OS support for doing this. The CPU does not know that task was switched - it does not know task exists at all. AROS x86 has that support in its kernel.
 

 
  AMIGA OS will also save the first 8 AMMX for you.
  So as long your code limits itself to them - you can even use an old unpatched Kickstart.
 
  Of course even using all 32 register will work - without unpatched Kick - as long as only 1 application like e.g. a Video player does this...
 
 

 
  Yes, indeed. I was not precise. There risk is however in running more than one application using 32 registers.

which, amiga, being considered multimedia multitasking machine, is expected to do.


Marcus Gerards

Posts 57
13 Aug 2016 13:13


Gunnar von Boehn wrote:

Marcus Gerards wrote:

    Would circumvent the compiler support problem,
   

   
  Lets be honest, 90% of Intel MMX code is handwritten - and never relies on compiler support.
   
  The same is true for ALTIVEC on PowerPC - 90% is handwritten in ASM. 
  This shows that compiler high level support is not important.

Aye, didn't know that - a few assembler macros will probably do the trick, then. :)



Andrew Copland

Posts 109
14 Aug 2016 20:51


Gunnar von Boehn wrote:

Marcus Gerards wrote:

    Would circumvent the compiler support problem,
   

   
  Lets be honest, 90% of Intel MMX code is handwritten - and never relies on compiler support.
   
  The same is true for ALTIVEC on PowerPC - 90% is handwritten in ASM. 
  This shows that compiler high level support is not important.
   
  MMX is very useful for work loops doing a lot of computation.
  For example a loop mixing audio channels would be much faster than normal 68k instructions, and only need 3 instructions in the loop.

That's not true in my experience, we use intrinsics mostly, and it integrates with whichever compiler a platform uses.

You do see asm implementations but still used within C/C++ projects.

Of course there's also the aggressive compiler implementations and optimisation passes.

These days you try to steer far away from using asm directly because whilst that asm can be quick it can break optimisations in the calling and surrounding code.


Henryk Richter
(Apollo Team Member)
Posts 108
15 Aug 2016 07:04


Andrew Copland wrote:

That's not true in my experience, we use intrinsics mostly, and it integrates with whichever compiler a platform uses.
 
  You do see asm implementations but still used within C/C++ projects.
 
  Of course there's also the aggressive compiler implementations and optimisation passes.
 
  These days you try to steer far away from using asm directly because whilst that asm can be quick it can break optimisations in the calling and surrounding code.

True or not depends on the personal style and the algorithmic requirements. Whether intrinsics, inline ASM or separate .asm files are the right tool depends on the capabilities of the compiler, portability concerns and the decision whether the compiler's output is deemed sufficient. I've used all three mentioned variants, e.g. AltiVEC performs well with intrinsics. SSE2/32 Bit has its perks when using inline ASM as you can keep more stuff in registers, compared to the compiler output (referring to GCC). On the other hand, especially with GCC, inline ASM is ugly, awkward and a PITA.

I disagree with the general remarks about Asm. Of course, it requires a lot of work, is inflexible, generally not portable, error prone, hard to debug etc. When it comes to SIMD, however I still haven't seen a compiler that outputs efficient SIMD code for the majority of applicable cases. (Intel's SPEC suite tuning notwithstanding)

YMMV, of course.


Thierry Atheist

Posts 618
15 Aug 2016 14:43


Do we care about portability, really? It's being coded (and optimised) for AMIGA 68080s.


Thierry Atheist

Posts 618
15 Aug 2016 14:52


We're in a situation where we can abandon everything from the past, as long as every generation of the Apollo Core is ALWAYS backwards compatible.

Forget about portability altogether, and let's look at completely optimised coding... We need to use EVERY advantage available, as we cannot get our hands on 14 or 20 NM 2.6+ GHz ASICs custom made for us.


Manuel Jesus

Posts 137
15 Aug 2016 16:06


68080 AMMX logo

EXTERNAL LINK


Henryk Richter
(Apollo Team Member)
Posts 108
15 Aug 2016 20:17


Thierry Atheist wrote:

  Do we care about portability, really? It's being coded (and optimised) for AMIGA 68080s.
 

  Again, a matter of personal taste. I've learned the hard way what it means to have one's code tied to a specific platform.
 
  I don't mind writing optimized code in ASM (where it counts) but these days I tend to keep a generic code path as backup route.


Wawa T

Posts 690
15 Aug 2016 20:27


Henryk Richter wrote:

 
Thierry Atheist wrote:

    Do we care about portability, really? It's being coded (and optimised) for AMIGA 68080s.
   

    Again, a matter of personal taste. I've learned the hard way what it means to have one's code tied to a specific platform.
   
    I don't mind writing optimized code in ASM (where it counts) but these days I tend to keep a generic code path as backup route.
 

 
  sounds like aros is an option here, right? no, im just joking. btw, the user you are answering to is by no means representative as far as im concerned. i ll always stick to minimax strategy.
 


Thierry Atheist

Posts 618
15 Aug 2016 22:55


There are 4 platforms.

Linux, Apple OS, win-DOS

and AMIGA.

So, portability would be AMIGA, and,.... x86 CPUs.

And, do they care about any software that is used on Amiga computers? Do they have ANY NEED of our software? They have more than a few of everything that they are interested in doing.

I'm sorry to say it, but the 68080 fork IS NOT backwards compatible to A2000s and such. SSE doesn't even work on the PPCs used in any Amiga NG systems either.

We might as well take advantage of what custom software has to offer us, as we just don't have the GHz to compete against their computers.


Wawa T

Posts 690
15 Aug 2016 23:06


sigh, they may happen not to need your sources, but we need theirs all the time. so portability is a question.
 
  same for 68k backward compatibility. if 68080 is not backward compatible, then why does it inherit the 68k instruction set at all? im waiting for you to make some sensible post one day..


Thierry Atheist

Posts 618
16 Aug 2016 02:09


If it's coded for MMX (SSE)/S-AGA then it can't work on all older systems, 68060 and before that.

If you code in C or Pascal or Java, if you are targeting the SIMD registers, it can't compile for anything below 68080.


Wawa T

Posts 690
16 Aug 2016 08:55


Thierry Atheist wrote:

If it's coded for MMX (SSE)/S-AGA then it can't work on all older systems, 68060 and before that.

alright, what concerns this extension, it may look like introducing a whole new target.
If you code in C or Pascal or Java, if you are targeting the SIMD registers, it can't compile for anything below 68080.

still so far there isnt any compiler to compile this. lets see how will this take off..



Olaf Schoenweiss

Posts 546
16 Aug 2016 09:22


why do you take him seriously? :-)

to topic... I think it would make most sense to optimise certain parts for vampire who benefit of the new instructions. That can be both parts of the OS and parts of aplications or games. In best case I would say all optimisations should be in OS and not in software. It is a basic decision what is your potential market/user base, just vampire, additional amiga hardware with say 68030, emulation and vampire and so on. All have potential different hardware limits. 68k will become more fragmented by that, on the other hand it potentially offers new sub-markets for developers.


Gunnar von Boehn
(Apollo Team Member)
Posts 4443
16 Aug 2016 09:41


For me AMMX tuning makes sense in

a) JPEG Datatypes
b) MPEG/VIDEO-player
c) SAGA GFX driver
d) some optimized games (like Flypes game demo)

The above use cases are currently ASM code.
So adding/replacing some 68k ASM function with AMMX ASM function would be the way to go.

I see not much point in high level C autovectorization for us.


posts 59page  1 2 3