Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

Wazp3d for 080 AMMXpage  1 2 

Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
18 Jun 2018 12:32


To all>
   
    I have moved this Wazp3d discussion to a new thread.
   
   
thellier alain wrote:

      This is because I dont have the answers too ;-)
      I dont have a Vampire now but I will buy a standalone a soon as possible so I cant tell how performs Wazp3D on a Vampire hardware
   

   
    Oh, if that is the problem, why not asking the team for v2 loaner plus some a500/a600?
   
    However v4 is not "too far" so even if we wait for your purchase, I am glad you are willing to do an 080 version! Thanks!
   
   
thellier alain wrote:

      So Wazp3D will need to be drastically enhanced to be really usable for games on Vampire : perhaps will need to run 20X faster or more
     
      As Wazp3D was almost never tested/enhanced/optimized for real 68k machines making it 2x faster with rewriting some parts is not impossible. Then using AMMX code will speed the most crucial part (processing pixel fragments) perhaps 5-10x.
      But it is theorical : will need to benchmark on real hard before starting to code
     

   
    There is a cow3d as benchmark.
   
    I believe mentioned general 68k optimizations would benefit all Classic/UAE users, so please proceeed that path >:-)
   
    Hope AMMX and V4 Vamps with a near 100Mhz 080 clock, more cache, faster RAM and their own FPU will have the needed muscles. That will be yours/mine standalone.
   
    I would be satisfied first if AmigaAmp Warp3D plugins and some W3D previewers and applications work any which way.
   
    StormMESA too as only or best OpenGL we have at 68k!
    EXTERNAL LINK   
    There are very few 68k and w3d games, but they somewhat work in software renderer mode.
 
 
gregthe canuck wrote:

  That youtube demo you linked to was running on core 2.7 with femu.
   
    It would be interesting to see that running under core 2.9 as a new performance base. It may be significantly improved already. :)
 

 
  Yes, in Lightwave jump was from 0.14 jubimarks 75Mhz FEMU to 7.x jubimarks for 85Mhz core 2.9 FPU  mostly thanks to real FPU (which will be even better in v4)!
EXTERNAL LINK


Thellier Alain

Posts 141
18 Jun 2018 13:39


>There is cow3d as benchmark.

Cow3D benchmark the overall Warp3D maximum speed
But here I will need to "slice" Wazp3D to discover what are really the parts that slowing it. It may be getting vertices from memory, clipping, rasterizing the polygon, processing the fragments, etc...

>their own FPU
In facts Wazp3D should not use much the fpu but integers
But when running (say) Quake then it should use the fpu from his side

>AmigaAmp Warp3D plugins
already works with Wazp3D



Samuel Devulder

Posts 248
18 Jun 2018 14:15


On the vamp fpu is so fast that is is usually pointless to use precalc tables. In fact large precalc tables are usually out-of-cache. This means that accessing them cost some extra penalty cycles. Better use floating-point computation instead, and improve the code by using superscalar. It is a bit touyght to code in the beginning, but in the end it is fun to write efficient pieces of asm.


Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
18 Jun 2018 15:25


thellier alain wrote:

        >There is cow3d as benchmark.
       
        Cow3D benchmark the overall Warp3D maximum speed
        But here I will need to "slice" Wazp3D to discover what are really the parts that slowing it. It may be getting vertices from memory, clipping, rasterizing the polygon, processing the fragments, etc...
       
       

       
        How about Sculp3d clone EXTERNAL LINK       
        What about Tornado3D and rave3d.library? Any good there in w3d accel.?
       
       
thellier alain wrote:

          >AmigaAmp Warp3D plugins
          already works with Wazp3D 
       

       
        Surely. That why Wazp3d is about. Some decent speed could be reached by AMMX 080 optimization .
       
        As end goal:
        There are 68k GLQuake, Descent, Adoom, some Sokoban clone and possibility to use Warp3D renderer in few games like Freespace and Payback, but only if faster then current one.
     
     
thellier alain wrote:

      .STL is not a complicated file format : I will add this file format to my Microbe3D one of those days...
     

     
      That means a new version? Thanks for another 68k Warp3D tool.
     
      Could WarpView be backported from OS4?
   
     
   
    StormMESA examples also looked lovely
     
   
 
Samuel Devulder wrote:

  ... Superscalar. It is a bit touyght to code in the beginning, but in the end it is fun to write efficient pieces of asm.
 

 
  Its never too bad to mention VASM does support all new Vamp specific instructions of 080.
 
  EXTERNAL LINK


Louis Dias
(Needs Verification)
Posts 55/ 1
19 Jun 2018 02:37


Moving the 3D conversation here...
 
  Why am I surprised .stl files are not supported?  Because Ligthwave 3D and Blender originated on the Amiga...did they not?
  .dxf was quite proprietary back then...  Thing didn't interchange well in the past.
 
  Amiga 3D games making used 3D video cards and openGL (StormMESA) have been released in the 90's as well.
  EXTERNAL LINK 
  Also, .stl has been around since 1987.
  People are buying Raspberry Pi(s) and Arduinos to run slicing programs to do 3D printing.  All this is open source.  We want Amiga to do it all.  I don't think me asking about .stl files was bad, nor was asking about the real-world ability to render them.  For instance, OpenSCAD uses software rendering.  It would be nice to see an Amiga port.
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
19 Jun 2018 06:15


Some general information regarding 3D applications on AMIGA / 68080

The 68060 CPU was already pretty powerful and could do some nice 3D calculations.

The Apollo 68080 Core does offer some major improvement over the 68060.

The 68060 could while it was executing an FPU instruction in parallel execute Integer instructions.
This means while e.g. a 5 cycle long FPU instruction was running in parallel the 68060 could execute several integer instructions.

The 68060 does of course offer the same feature of paralism.
While FPU instructions run, Integer instruction can in parallel be executed.
But the 68080 does even offer more, as the 68080 FPU is fully pipelined - new other FPU instruction can also be executed in parallel. This means while e.g. a 5 cycle FPU instructions is running, the 68080 could in parallel start executing 4 other FPU instructions.

Its obvious that coding making full use of this could reach on the 68080 several times the FPU speed of an 68060.



A1200 Coder

Posts 74
19 Jun 2018 07:06


Gunnar von Boehn wrote:

  Its obvious that coding making full use of this could reach on the 68080 several times the FPU speed of an 68060.
 
 

 
  You have an idea about the typical performance of the V4 FPU when doing optimized code (for example inner loops and 3d calculations)? Say, at 100 MHz would 150 MFLOPS or more be possible? With integer instructions (no FPU used) and peak of 4 instructions per clock cycle, I remember that optimized code for 68080 may typically run at little less than 2 instructions/clock cycle.


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
19 Jun 2018 07:27


A1200 coder wrote:

  You have an idea about the typical performance of the V4 FPU when doing optimized code (for example inner loops and 3d calculations)?
 

 
Lets look at FDIV as example.
FDIV does a Floating Point Divide.

The V4 FPU can do 1 FDIV operation each clock cycle.
For comparison the 68060 needs 37 clock cycle for an FDIV.
The V4 Apollo-68080 @ 100 MHz therefore equals in peak FPU Divide performance an 68060 @ 3.7 GigaHerz.
 
 
 
Lets look at FSQRT as example.
FSQRT does a Floating Square Root.
 
The V4 FPU can do 1 FSQRT operation each clock cycle.
For comparison the 68060 needs 68 clock cycle for an FDIV.
The V4 Apollo-68080 @ 100 MHz therefore equals in peak FPU Square Root performance an 68060 @ 6.8 GigaHerz.

Its obvious that there is a major speed up potential.


Thellier Alain

Posts 141
19 Jun 2018 09:18


All the StormMesa demos works with Wazp3D: they are not very big so they should works on Vampire too. Same about Sculp3d clone (but it is not a very interesting app in fact)

>Tornado3D and rave3d.library?
No: rave3d is not Warp3D

The end goal is all the progs that draw thousands of triangles like Quake1-3,Cube,FPSE the playstation emulator, etc...

WarpView can be ported to OS3 ... if his author wants that... (it dont use Warp3D much so should not be a problem on Vampire)




Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
19 Jun 2018 09:22


thellier alain wrote:

    >Tornado3D and rave3d.library?
    No: rave3d is not Warp3D
   
    The end goal is all the progs that draw thousands of triangles like Quake1-3,Cube,FPSE the playstation emulator, etc...
 

 
  Thanks for answers and very nice end goal!
 
  Well, last "Warp3D breath" was more of a demo what was possible :-)
 
  For most of the mentioned end goal games and PS emulator I am not sure whenever 68k and especially 68k warp3D versions were ever made, but hope this adventure might enable that.
 
  On Tornado3D I was not clear enough. It oficially support Warp3D renderer. Does it do any good on w3d acell. Amigas?
 
  There was even nicer W3d example in Microbe by Hesus, its a Vampire logo
  EXTERNAL LINK


A1200 Coder

Posts 74
19 Jun 2018 16:28


Gunnar von Boehn wrote:

A1200 coder wrote:

    You have an idea about the typical performance of the V4 FPU when doing optimized code (for example inner loops and 3d calculations)?
   

   
  Lets look at FDIV as example.
  FDIV does a Floating Point Divide.
 
  The V4 FPU can do 1 FDIV operation each clock cycle.
  For comparison the 68060 needs 37 clock cycle for an FDIV.
  The V4 Apollo-68080 @ 100 MHz therefore equals in peak FPU Divide performance an 68060 @ 3.7 GigaHerz.
   
   
 
  Lets look at FSQRT as example.
  FSQRT does a Floating Square Root.
   
  The V4 FPU can do 1 FSQRT operation each clock cycle.
  For comparison the 68060 needs 68 clock cycle for an FDIV.
  The V4 Apollo-68080 @ 100 MHz therefore equals in peak FPU Square Root performance an 68060 @ 6.8 GigaHerz.
 
  Its obvious that there is a major speed up potential.

Ok, thanks.
Wikipedia says that 68060 FPU has ~36 MFLOPS@66MHz. Maybe this is wrong. Seems like a lot with that slowness in some FPU instructions, but you get something like 55 MFLOPS @ 100 MHz. If most FPU instructions are 1 clock on 68080, then you get 100 MFLOPS with 68080@100 MHz. I think that 68080 FPU should be on average more than just around 2x faster than 68060 FPU.



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
19 Jun 2018 18:20


A1200 coder wrote:

Wikipedia says that 68060 FPU has ~36 MFLOPS@66MHz.
Maybe this is wrong.

Depends how you count MFLOPS.
FMOVE Fp0,Fp1
This instruction copies an FPU value from one reg to another and is fast.

APOLLO 68080 can do 3 operant instruction and can do a free FMOVE.




Manuel Jesus

Posts 155
19 Jun 2018 22:30


"
  There was even nicer W3d example in Microbe by Hesus, its a Vampire logo
  EXTERNAL LINK "

This was a 400 poly vampire logo I made to demo as the other objects that came along with Microbe were too slow.


Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
20 Jun 2018 03:54


Manuel Jesus wrote:

  This was a 400 poly vampire logo I made to demo as the other objects that came along with Microbe were too slow.

Has the performance improved with 2.9?


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
20 Jun 2018 07:37


Vojin Vidanovic wrote:

Has the performance improved with 2.9?

Yes the performance improved a bit.
We work since 10 years on the Core and its constantly improved and every day getting faster and better.

For example, the other day we added support for 256 Byte memory burst to the SAGA chipset.


Samuel Devulder

Posts 248
20 Jun 2018 07:39


About 3D demo, have a look at the "Monkey(sam)" demo in CoffinOS (use -h for help IIRC). When I'll have time I'll add support for STL files. EXTERNAL LINK


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
20 Jun 2018 08:08


Samuel Devulder wrote:

About 3D demo, have a look at the "Monkey(sam)" demo in CoffinOS

Very nice demo.
The demo renders a 3D figure of about 1000 triangles with shading on the WB in 25 FPS, right?



Samuel Devulder

Posts 248
20 Jun 2018 11:37


Yes, that's about that. On exclusive full screen it it a bit higher of course. It is plain C plus a tiny bit of 68030/68881-asm (eg not quite superscalar). No AMMX optim yet, except in memcpy replacement routine.

For reference the wire-frame mode (-wire option: no face filling. Just edge drawing with Bresenham) runs at 40fps. This mean that the simple fpu-rasterizer runs pretty quick on the Vamp indeed.
 
Tips: you can use keyboard to rotate the head or zoom in/out. There is also a hidden option to display something else (-model <something_to_guess>).


Mallagan Bellator

Posts 393
21 Jun 2018 01:46


A1200 coder wrote:

  Wikipedia says that 68060 FPU has ~36 MFLOPS@66MHz. Maybe this is wrong. Seems like a lot with that slowness in some FPU instructions, but you get something like 55 MFLOPS @ 100 MHz. If most FPU instructions are 1 clock on 68080, then you get 100 MFLOPS with 68080@100 MHz.

MFLOPS and MIPS are quite misleading. Your name suggests that you’re actually a coder, sorry if I’m unfamiliar with what you’ve made, if anything? Or are you new to coding?
I know this much, operations and instructions in both INT and PF ”types” are different from each other. One FP instruction can take more time to execute than another FP instruction. The same holds truth betweet INT instructions, they are different ”lengths” if you will.
Therefore, using specific FP or INT instructions may result in a different MFLOPS value that if you would use other specific instructions. They are not necessarily the same

I wouldn’t stare myself blind on MIPS and FLOPS values, but you could use them for an estimate, only keep this in mind

If you wanna know what something can really do, look at how well a certain HW performs on various software, games etc


Mallagan Bellator

Posts 393
21 Jun 2018 01:57


Gunnar von Boehn wrote:

Some general information regarding 3D applications on AMIGA / 68080
 
  The 68060 CPU was already pretty powerful and could do some nice 3D calculations.
 
  The Apollo 68080 Core does offer some major improvement over the 68060.
 
  The 68060 could while it was executing an FPU instruction in parallel execute Integer instructions.
  This means while e.g. a 5 cycle long FPU instruction was running in parallel the 68060 could execute several integer instructions.
 
  The 68060 does of course offer the same feature of paralism.
  While FPU instructions run, Integer instruction can in parallel be executed.
  But the 68080 does even offer more, as the 68080 FPU is fully pipelined - new other FPU instruction can also be executed in parallel. This means while e.g. a 5 cycle FPU instructions is running, the 68080 could in parallel start executing 4 other FPU instructions.
 
  Its obvious that coding making full use of this could reach on the 68080 several times the FPU speed of an 68060.
 

Hi Gunnar
As you’re aware (I’m sure), using FPU (or at least with the 882, I know this) as a ”peripheral”, outside the CPU itself, it has a core of it’s own and may even run at a different frequence than the CPU, in older settings.

I thought of this when you said the 060 can run FPs in parallell to INTs.
So my question is this.
Is the FPU of 080 in structure parallell to the CPU, or are all the FP instructions added to the 080 CPU core? Do they run parallell or in serial? And does it build on the method of using a separate FPU as peripheral (only internally), or as part of the CPU itself?
I do understand that in this setting they would of course need to run at the same speed

posts 40page  1 2