Overview Features Instructions Performance Forum Downloads Products OrderV4 Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

Status of the FPUpage  1 2 3 4 5 6 7 8 9 10 11 12 13 

Rollef 2000

Posts 11
04 Sep 2017 09:53


I ask the questions again, because I am very interested in the answers.
 
 
 
Gunnar von Boehn wrote:
 
  While the APOLLO FPU was successfully used in some customer project, for usage in 68K AMIGA OS some more testing was found required.
  The Apollo-team did ask for AMIGA coders community to help testing 12month ago - in this time no coders applied for this.
 

 
 
Gunnar von Boehn wrote:
 
  We look for someone interested in helping to write a new high end FPU application - which shows that FPU performance can be pushed to new limits.
 

 
  Was there are a time the hardFPU fit to current Vampire?

  When no, how should the coders write and test programms?
 



Andrew Copland

Posts 113
04 Sep 2017 11:41


I did write a ray marcher that used almost all of the FPU functions but it was in C.

The demand that everything be in pure 68k asm is a BIG request Gunnar.


Vojin Vidanovic

Posts 770
04 Sep 2017 12:10


Andrew Copland wrote:

  The demand that everything be in pure 68k asm is a BIG request Gunnar.

It is. But is fastest option, and currently only asm supports 080.
Hope it will change, but I see no software developer that will updates some Amiga C etc.

But this will become more relevant when V4 FPU is released (core 3+)


Gunnar von Boehn
(Apollo Team Member)
Posts 4797
04 Sep 2017 12:34


Vojin Vidanovic wrote:

Andrew Copland wrote:

  The demand that everything be in pure 68k asm is a BIG request Gunnar.

 
  It is. But is fastest option, and currently only asm supports 080.

This is not true  - you can also code in any high level language for 68080. But in high level language you have zero control what you are doing.
 


Gunnar von Boehn
(Apollo Team Member)
Posts 4797
04 Sep 2017 12:38


Andrew Copland wrote:

I did write a ray marcher that used almost all of the FPU functions but it was in C.
 
The demand that everything be in pure 68k asm is a BIG request Gunnar.

No, I did not demand this.

I said that the "core" calculation routines need be written in ASM.
I'm sure you fully agree with me that only in ASM you have full control what you are doing.

The point of writing fast code is to write the code
- that uses both pipelines (Super-scalar),
- that avoid instruction dependencies (is non sequential, interleaved or unrolled)

You can control this easily in ASM.
In C you can not - unless you at the same time re-write the C compiler.

So doing this in ASM is for the coder certainly much less effort than doing it in C.




Gunnar von Boehn
(Apollo Team Member)
Posts 4797
04 Sep 2017 12:46


rollef 2000 wrote:

Was there are a time the hardFPU fit to current Vampire?
When no, how should the coders write and test programms?

a) we have cards where it fits.
b) you could even code for vampire without having an AMIGA nor a Vampire. You write the code, give me, I run it and tell you the GIGAFLOPS score.


Markus Horbach

Posts 35
04 Sep 2017 13:09


>The demand that everything be in pure 68k asm is a BIG request Gunnar.
I agree to this.

>I'm sure you fully agree with me that only in ASM you have full control what you are doing.
I agree to this, too.

But a good C coder is not automatically a (good) ASM coder. And 68k Assembly is a niche at the moment. With the growing power of x86/x64 hardware over the last 20 years, hand optimised ASM code became rare to Joe Average coders. Even the embedded coder use C libs with hand optimised routines coded by the compiler/cpu architecture  manufacturer. ARM offers e.g. DSP Libs for all Cortex-M uCs, no matter if the Cortex-M has hardware functions for DSP or not. Some Cortex-M cores (the cheap ones) solve this by software, others have a FPU and DSP extensions which result in far better performance with the Libs.
I see the same pattern to V2 (femu) and V4 with hard FPU.
A good compromise for Apollo Core could be a AMMX_Math.h which can process arrays and streams of data for the most often used math functions on a higher level. A Joe Average coder like me will not get 100% of the possible performance, but it will be better than only code 68k programs which will run on a stock Amiga 1000. The Vampire will accelerate this, too, but with no use of the extended instruction set, just the accelerated instruction execution clockwise or CPU architectural wise.
It all leads to the (in my optinion ) urgently needed IDE/C compiler for 68080 and WB3.xx
The vampirised Amigas will benefit in a broader offer of NEW software created by new coder attracted to the easy to use platform. But this are just my 2 cents ...


Mr Niding

Posts 446
04 Sep 2017 13:14


Gunnar von Boehn wrote:

Vojin Vidanovic wrote:

 
Andrew Copland wrote:

    The demand that everything be in pure 68k asm is a BIG request Gunnar.

 
  It is. But is fastest option, and currently only asm supports 080.
 

 
  This is not true  - you can also code in any high level language for 68080. But in high level language you have zero control what you are doing.
 

I guess what Vojin is saying; is that current high level compilers doesnt support the new features of the 68080 core, and as such wont be able to take advantage of the performance potential.
And IF high level compilers DID support said features, it would increase their potential compared to legacy oriented high level compilers.

Did I understand you right Vojin?


Gunnar von Boehn
(Apollo Team Member)
Posts 4797
04 Sep 2017 13:21


Mr Niding wrote:

I guess what Vojin is saying; is that current high level compilers doesnt support the new features of the 68080 core, and as such wont be able to take advantage of the performance potential.

Do you assume you need to use new 68080 instructions?
This is NOT the case.

The FPU of the 68080 is 100% compatible to the previous 68K FPUs.
You can continue to write using the normal FPU instructions.

The point of the FPU coding will be to "hand control" the instruction scheduling.
In other words the order in which the instructions are placed in the program. And you can only control this in ASM.
Therefore this can only be sensible be coded in ASM. (as of today)




Mr Niding

Posts 446
04 Sep 2017 13:41


Gunnar von Boehn wrote:

Mr Niding wrote:

  I guess what Vojin is saying; is that current high level compilers doesnt support the new features of the 68080 core, and as such wont be able to take advantage of the performance potential.
 

 
  Do you assume you need to use new 68080 instructions?
  This is NOT the case.
 
  The FPU of the 68080 is 100% compatible to the previous 68K FPUs.
  You can continue to write using the normal FPU instructions.
 
  The point of the FPU coding will be to "hand control" the instruction scheduling.
  In other words the order in which the instructions are placed in the program. And you can only control this in ASM.
  Therefore this can only be sensible be coded in ASM. (as of today)
 

I dont assume you need to use the new 68080, Im just saying/assuming that the high level compiler needs to support the new instructions to FULLY utilize the increased performance potential over legacy 68k's.

Be it FPU or other aspects of 68080 programming. If Im wrong about high level compilers not needing a upgrade, then just ignore me :)


Gunnar von Boehn
(Apollo Team Member)
Posts 4797
04 Sep 2017 14:13


Mr Niding wrote:

  I dont assume you need to use the new 68080, Im just saying/assuming that the high level compiler needs to support the new instructions to FULLY utilize the increased performance potential over legacy 68k's.

Let us provide some clear information - to prevent misunderstandings-

The normal CPU instructions from 68030, 68040, to 68060 did not change.
The instructions that the 68060 support where also supported with the 68040.

But the 68060 can execute up to 2 instructions per cycle cycle.
While the 68040 could only execute 1 per cycle.

So the 68060 could execute at the same clockrate up to twice as many instructions as the 68040 - IF the instructions were independent.

So for the 68060 is to matter a lot in which order the instructions are in the program - so that the CPU could execute them in parallel.
We call this "ordering" of the instructions the instructions scheduling.

So coders optimizing their programs to run fast on 68060 did not need to use new instructions - they needed to make sure the instructions are in the perfect order in program code.

Now regarding the FPU.
The APOLLO 68080 Core FPU does provide 100% the same FPU instructions as the 68040 and 68060 FPUs.
While some people might believe 68080 has complete new incompatable FPU instruction - this is NOT the case.
The instructions are 100% the same.

But the FPU in 68080 is Super Scalar and Fully Pipelined.
This means from evolution this is a evolution Jump like 68000 to 68060 for integer code.

This means to now make real use of this FPU - programmer need now also for FPU code write code which takes good care of the instruction scheduling / the instruction order.

So this is similar to the CPU code tuning for 68060.

So what we talk about here - is tweaking the instruction order of small routines. This is nothing esoteric. CPU coders for 68060 do this all the time for their integer code. And on all other platforms x86, Power etc FPU coders also tweak the instruction order in memory  to get good performance.

The



Niclas A
(Apollo Team Member)
Posts 215
04 Sep 2017 15:55


But at the same time a normal (old) C compiler that is not optimized for super scalar will most likely produce binaries that takes little to non usage of a nice modern CPU.

But a compiler that have a -060 or -080 flag can try at least to order code to be super scalar right?

So with luck someone will step up and port a new version of GCC and also try and improve output for a new modern 68k. (68080) :)



Michal Warzecha

Posts 209
04 Sep 2017 16:35


You're probably right, but before any one tune up GCC or whatever, Apollo FPU must be checked and probably improved. Without pure ASM code it's not possible.


Vojin Vidanovic

Posts 770
04 Sep 2017 16:38


Niclas A wrote:
 
  So with luck someone will step up and port a new version of GCC and also try and improve output for a new modern 68k. (68080) :)

Since overall 68k GCC is must for all (Linux 68k improvement, o60 backporting, Vampire ...) if you know/find that "someone" I am for the Kickstarter or any other solution for funding.

Will not happen by itself.



Niclas A
(Apollo Team Member)
Posts 215
04 Sep 2017 16:38


Michal Warzecha wrote:

  You're probably right, but before any one tune up GCC or whatever, Apollo FPU must be checked and probably improved. Without pure ASM code it's not possible.
 

 
  Yes you are also right :)
 


Niclas A
(Apollo Team Member)
Posts 215
04 Sep 2017 16:46


Something like this for 68k would have been cool.

EXTERNAL LINK 


M Rickan

Posts 174
04 Sep 2017 19:29


Mr Niding wrote:

  Im glad you are stepping up then.

Clever.

Perhaps you missed the point about control and propriety?



Rollef 2000

Posts 11
04 Sep 2017 19:43


Niclas A wrote:

Something like this for 68k would have been cool.
 
  EXTERNAL LINK 

EXTERNAL LINK 
browncc 6.2 with --target=m68k-elf or what you want.



Samuel Devulder

Posts 246
04 Sep 2017 21:56


(a quick side note about 68k optim)
   
   
rollef 2000 wrote:

        browncc 6.2
       

One can see that this compiler (gcc 7.2.0 indeed) doesn't produce the best 68k code. For instance with: 
         int fact(int num) {return num<=0?1:num*fact(num-1);}
with some size-related optimization options, it produces:
fact(int):
                move.l 4(%sp),%d1
                moveq #1,%d0
        .L3:
                tst.l %d1
                jle .L1
                muls.l %d1,%d0
                subq.l #1,%d1
                jra .L3
        .L1:
                rts
which is quite good, but not the best as the following one looks better to me (smaller & faster):
fact(int):
                moveq #1,%d0
                move.l 4(%sp),%d1
                jle .L1
        .L3:
                muls.l %d1,%d0
                subq.l #1,%d1
                jgt .L3
        .L1:
                rts
Maybe for 060+ the original ASM code is better, I don't know since I'm severly biased with 030 coding. Anyway, it's a funny way to explore compilers (and compiling options) :D
       


Gunnar von Boehn
(Apollo Team Member)
Posts 4797
04 Sep 2017 23:15


yes 2nd code looks much better

posts 254page  1 2 3 4 5 6 7 8 9 10 11 12 13