APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

Performance and Benchmark Results!

300Mhz Target	page 1 2 3 4

Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
30 Aug 2018 11:29

gregthe canuck wrote:

I don't think it is realistic to expect that much of a clock speed jump. I believe a 14x core was demoed which is a nice bump up from the 12x on V2. But that isn't guaranteed. *Maybe* if at some point the team does a "black edition" with a higher speed grade of the chip used in the V4 they could maybe go one speed level higher? But as Majsta has noted earlier the speed grading is a bit of a crapshoot.

Its NOT LE scalable (more space - more speed). I do expect x14 as standard bullet proof core and x15 as kind of overclock.A bit more about end of its life + a lot of new feats, which is as important, if not more.

Gregthe Canuck

Posts 274
30 Aug 2018 13:30

Vojin Vidanovic wrote:

Its NOT LE scalable (more space - more speed).

Hi Vojin!

You missed my point. It is scalable when you can take advantage of the extra LE's for bigger caches, more comprehensive optimizations, extra pipelines/units, more branch prediction logic, etc...

Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
30 Aug 2018 13:54

gregthe canuck wrote:

You missed my point. It is scalable when you can take advantage of the extra LE's for bigger caches, more comprehensive optimizations, extra pipelines/units, more branch prediction logic, etc...

That is exact diff V4 to V2 (more cache, a bit higher clock, faster RAM, full ApolloFPU). But that is NOT an infinite source of speed increase to named 300Mhz target (and whose 300Mhz? Pentium Pro/II/III 300Mhz levels?). Feature wise its really a modern CPU and our current FPGA is a blessing and a curse. Curse since you cannot increase speed that easily as with higher clocked CPUs.


Gregthe Canuck Posts 274 30 Aug 2018 14:23	I never said it was an infinite source of speed. My point was and still is that having more LE (with same clock speed and same RAM speed) allows for a faster core. That is all.

Mr Niding

Posts 459
30 Aug 2018 14:31

Greg;

If I read you right;

We shouldnt focus on Mhz, as there are alot of performance to be utilized by smart design and code.

I was watching Hardware Unboxed the other day, and they ran Ryzen vs Intel tests on Linux and Windows.
Linux blew Windows out of the water on most tests.

Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
30 Aug 2018 17:16

gregthe canuck wrote:

I never said it was an infinite source of speed. My point was and still is that having more LE (with same clock speed and same RAM speed) allows for a faster core. That is all.

Dont take it personal, people have cried for Mhz all the time. Increase in speed with Vamps with each core is nice and steady, so one should just be patient. And buy v4.


Jm 68k Posts 2 30 Aug 2018 23:48	32 cycles @ 8MHz = 4 µs 0.5 cycle @ 85 MHz = 5.88 ns 4µ / 5.88 ns = 680 680 x 8 MHz = 5440 MHz = 68000 @ 5.4GHz!!! Correct Gunnar?

Sean Sk

Posts 488
31 Aug 2018 00:54

An easier way to work it out is:

68000 = 32 Cycles for instruction
68080 = 0.5 Cycles for instruction

32 / 0.5 = 64

That means the 68080 could perform the same instruction 64 times in the time it takes for the 68000 to do it just once!

85mHz x 64 = 5440mHz <--- The speed at which the 68000 would have to run to perform the instruction in the same amount of time.


Peter Slegg Posts 22 31 Aug 2018 13:21	How would that compare with a 68060 at 50MHz ?

Don Adan

Posts 38
31 Aug 2018 14:12

jm 68k wrote:

32 cycles @ 8MHz = 4 µs
0.5 cycle @ 85 MHz = 5.88 ns
4µ / 5.88 ns = 680
680 x 8 MHz = 5440 MHz = 68000 @ 5.4GHz!!!

Correct Gunnar?

No. Fastest 68000 instruction needs 4 cycles. Fastest 68080 instruction needs 0.5 cycle. Comparing single instruction has no big sense. F.e. mulu.w needs 70 cycles for 68000 and perhaps 2 (?) cycles for 68080. Divu.w needs 140 cycles for 68000 and perhaps 19 (?) or 27 (?) cycles for 68080.

Don Adan

Posts 38
31 Aug 2018 14:16

Peter Slegg wrote:

How would that compare with a 68060 at 50MHz ?

For most code same speed at same CPU clock, but 68080 can do more instructions in 0.5 cycle. Then is fastest a few, if original 68k instructions set is used. Of course big difference exist for all 68k instructions which are trapped for 68060, f.e movep.


A1200 Coder Posts 74 31 Aug 2018 16:51	Well, if you look at MC68060 manual, an addi.l #data, (d16, An) costs 2 clock cycles on a 68060. So the same instruction would be 4 times faster on Vampire, and you get of course an additional speedup from higher clock speed on Vampire.

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
01 Sep 2018 07:33

Peter Slegg wrote:

How would that compare with a 68060 at 50MHz ?

The 68K family offers a high number of instructions.
And the 68K family support a powerful wealth of Address-Modes.
This makes the 68K so versatile and powerful.
The Instruction length on the 68000 can be 2 Byte, 4 Byte, 6 Byte, 8 Byte or 10 Byte. Since the 68020 even longer instructions are supported.

The big number of instructions makes comparing CPUs complex.

The 2nd fastest 68K CPU is the 68060.
The 68060 can do peak up to continuously 2 instructions per cycle.
This is very good.
But the 68060 can do this only if both instructions are very short and only 2 bytes long each.

The fastest 68K CPU is the 68080.
The 68080 is designed very similar to the 68060, but it adds a huge number of improvements to the 68060 design.
For some code the 68080 can do peak up to 4 instructions per cycle.
The 68080 can do several instructions per cycle for 2 Byte, 4 Byte, 6 Byte even 2 times 8 byte long instructions.

The 68080 supports simultaneous 3 Data-Cache operations per cycle, (READ/WRITE/REFILL) and the 68080 supports misaligned Cache access for no cycles extra.

Typically the 68080 is about 50% faster than an 68060 at the same clockrate. If you use new features like AMMX it becomes of course much faster.


John Heritage Posts 112 02 Sep 2018 16:07	Gunnar - where do I find a list of byte length for each 68K instruction?

Philippe Flype
(Apollo Team Member)
Posts 299
02 Sep 2018 22:06

@John

It depends not only of the instruction, but also the Effective Address mode and the Immediates input length.

D5 is much shorter than ([$1234,A2,D5.l*2],$5678),D7

#XXX.W is shorter than #XXX.L, same for .L/S/D/X FPU inputs.

For the instructions themselves, there are some tables at end of the official Programmer Book (starting from page 561/646) :

EXTERNAL LINK
;)

Philippe Flype
(Apollo Team Member)
Posts 299
02 Sep 2018 22:15

That was long time ago i did not put my hands into MiniBench.

Since the core got FPU, and some features such as the OoO, the tool needed some refurbishment.

Below is a MIPS, MFLOPS, MB/SEC battle between

- MC68060 @ 50MHz,
- MC68060 @ 80MHz,
- AC68080 @ 78MHz.

EXTERNAL LINK

Nixus Minimax

Posts 416
03 Sep 2018 08:21

Philippe Flype wrote:

Below is a MIPS, MFLOPS, MB/SEC battle between

- MC68060 @ 50MHz,
- MC68060 @ 80MHz,
- AC68080 @ 78MHz.

These numbers are truly impressive! The only surprise was the relatively weak MULU-value. How come the 080 is markably slower in this discipline than the 060?

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
03 Sep 2018 11:36

OK, lets look at the summery result

Amiga 4000 + Cyberstorm060MK-I @ 50 MHz
----------------------------------
| CPU SCORE : 54 MIPS
| FPU SCORE : 19 MFLOPS
| MEM SCORE : 43 MB/Sec
----------------------------------
| ALL SCORE : 116 Points

Amiga 1200 + Apollo 1260 @ 80 MHz
----------------------------------
| CPU SCORE : 86 MIPS
| FPU SCORE : 27 MFLOPS
| MEM SCORE : 58 MB/Sec
----------------------------------
| ALL SCORE : 171 Points

APOLLO AC 68080 @ 78 MHz
----------------------------------
| CPU SCORE : 114 MIPS
| FPU SCORE : 73 MFLOPS
| MEM SCORE : 224 MB/Sec
----------------------------------
| ALL SCORE : 411 Points

Henryk Richter
(Apollo Team Member)
Posts 128/ 1
03 Sep 2018 13:20

Nixus Minimax wrote:

These numbers are truly impressive! The only surprise was the relatively weak MULU-value. How come the 080 is markably slower in this discipline than the 060?

Probably because the test didn't involve mulu.l d0,d1:d2 <insert grinning emoticon here>. Seriously though, it might warrant a look.

Philippe Flype
(Apollo Team Member)
Posts 299
03 Sep 2018 17:13

About MULU, the test is a dumb "MULU.L Dx,Dy"


REPT 8
  MULU.l d1,d5
  MULU.l d1,d4
  MULU.l d1,d2
  MULU.l d1,d3
ENDR

On the 060, the MULU.L 32 bits is very fast because Motorola removed the 64 bits support.
They moved the 64 bits operation into the FPSP (software library).

So this is very slow on the 060 :


  MULU.L <ea>,Dr:Dq 32*32 -> 64
  MULS.L <ea>,Dr:Dq 32*32 -> 64

On the 080, both 32 bits and 64 bits are in Hardware, no FPSP involded.


  MULU.L (ea),Dn 32 -> 32
  MULS.L (ea),Dn 32 -> 32
  MULU.L (ea),Dr:Dq 32*32 -> 64
  MULS.L (ea),Dr:Dq 32*32 -> 64

Side notes:
If Minibench did a MULU 64 bits, the 060 test would be very slow, in comparison (hence, unfair).
If the MULU 32 bits test would be done from MEM, the Vamp FastRAM would probably be faster.

posts 68	page 1 2 3 4