APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

Information about the Apollo CPU and FPU.

Comparing Apollo to 68060

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
04 Dec 2015 19:25

Several people have asked how APOLLO compared to 68060.

Every benchmark is different so you can NOT answer this with a single value.
This is also the reason why AIBB does not calculate one number but offers many different benchmarks.

For those interested some results to compare.


AIBB MATRIX
68060 @ 80 Mhz =  60.44
APOLLO         = 178.03
APOLLO equivalent to 68060 @ 235 MHz
      
AIBB IMATH
68060 @ 80 Mhz =  98.88
APOLLO         = 357.14
APOLLO  equivalent to 68060 @ 290 MHz

These are just 2 examples, in all AIBB tests
and all other benchmarks that we ran so far
Apollo scores better than 68060 @80 Mhz.

Gregthe Canuck

Posts 274
04 Dec 2015 19:48

Thanks for those numbers.

One other valuable metric would be memory latency/bandwidth compared to a Cyberstorm 060.

Cheers!

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
05 Dec 2015 07:59

Comparing CPUs is complex much more complex than comparing cars for example.

Its easy to understand the complexity here.
The CPU can execute many hundreds of different instructions,
and also the performance does depends on so many other factors
like cache width, memory speed, possible instructions in parallel etc.

Beating an 68030 is pretty easy as the 68030 is a relative bad CPU.

Now why is the 68030 bad?
The 68030 can only execute 1 instruction in parallel.
Instruction on the 68030 do not take just 1 cycle, but minimum 2 and also EA calculation costs extra cycles.
This means the 68030 has many instruction taking 4, 6, or even 10 or more clocks.

Depending on the instruction mix a 50 Mhz 68030 can execute max 25 Millionn instructions per cycle - but as many instructions take more than 2 clock, its executes realistically in average about 10 Million instructions at 50 MHz.

The 68030 has very small caches which are often nearly useless.
The 68030 has no real branch prediction.
The 68030 has no Subroutine call acceleration.
The 68030 can not detect memory stream and can not prefetch memory effectively.

Now the 68060 is a MUCH MUCH better CPU than the 68030.
The 68060 is Super Scalar and can execute up to 2 instructions per clock.
Most instructions take only 1 clock on the 68060.
The caches of the 68060 are much better than the caches of the 68030.
The 68060 has a very good branch prediction.

The 68060 is clearly the best 68K ever produced by Motorola.

But the 68060 has also some areas which can be improved.
a) The 68060 Icache can only deliver 4 Byte per cycle.
While the 68060 is super scalar and could process 2 instruction per clock - this Icache bottleneck does limit this very often as 4 byte are not enough to feed both pipes.

b) The Icache does only deliver 2 bytes per clock for a Jump or Subroutine which has an unfortunate alignment. This means unlucky aligned Subroutines can be slower.

c) The Icache can not very effectively prefetch this means performance of Programs bigger than the Icache is much lower.

d) The DCache can not handle misalign operations for free.
This means data misalignment in memory or on stack will slow the core down.

e) The DCache can not detect memory stream and can not prefetch effectively. This means performance of highly memory intensive tasks like e.g. Image manipulation is slow.

f) The 68060 can not accelerate subroutine returns

g) The 68060 left some useful instruction like 64 Bit MUL and DIV away. These need to be emulated in software.

APOLLO is very similar to the 68060 but is addresses and improves all areas would were not optimal on the 68060.

1) Apollo is Super Scalar and can execute several instructions per clock. Apollo Super Scalarity is stronger than the 68060 and it can execute more instruction combination Super scalar.

2) The execution time of all 68k instructions is very low.
Even lower than on 68060. Most instructions need only 1 clock.

3) The Icache is very strong - it deliver 16byte per clock cycle. This is 4 times more than the 68060 Icache.

4) The Icache also delivers 16byte on any address. So aligment of subroutine label is not needed for optimal speed.

The Icache can also prefetch very effectively.
Apollo therefore can execute huge programs from main memory even faster than 68060 would execute small programs from ICache.

5) The Dcache is very strong in can read 8 Bytes and in parallel also write 8 Bytes per cycle and in parallel even prefetch 8 bytes per cycle.
The Dcache detects memory streams and will automatically prefetch. This means performance of memory intensive games or programs is best in class. Here Apollo even beats GigaHerz clocked PowerPC.

6) DCache can for free support misaligned Reads and Writes.
This means Apollo has optimal performance even with misaliged Data or Stack.

7) Apollo accelerates Subroutine returns

8) Apollo supports in hardware the useful instructions which were lost in the 68060. The 64bit mul for example only takes 2 clock cycles on Apollo - while an 68060 needs to emulate this in many many cycles.

9) Many Bitfields operations take only a single cycle for Appollo - the 68060 needed 10 or more often. This makes using Bitfields now really sensible.

10) Apollo can FUSE often used 68k instruction patterns of 2 instructions into 1. This improves the instructions per cycle. And this way Apollo has less bubbles in code execution than 6860 and can sometimes execute 4 instruction per clock.

The 68060 was a pretty good CPU.
We are happy that we could improve it in so many aspects.

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
05 Dec 2015 08:21

A good performance example is the dreamscape video on our bringup page.

In this video you see the same demo on 68060 @ 80Mhz on A1200 as well as on APOLLO on A600.

The video shows that both A1200 and A600 run the video at nearly the same speed. The A1200 is only minimally faster.

Now as every one knows AGA chipmem has much higher bandwidth than ECS chipmem. AGA has 32bit chip memory - while A500 and A600 only had 16bit chip memory.

With 32bit memory copying the frame on the A1200 needs halve the time as copying on the A600.
That Apollo nevere theless reaches the same speed as the 68060 on AGA shows that APOLLO is faster - as it has less time available.

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
06 Dec 2015 23:33

gregthe canuck wrote:

Thanks for those numbers.

One other valuable metric would be memory latency/bandwidth compared to a Cyberstorm 060.

Cheers!

I fully agree with you, memory performance is very imporatent for all bigger apps.
What would be your favority measurement tool, Bustest? AIBB? Minibench?
Do you have values for a Cyberstorm 060 to compare?


Claudio Guglielmotti (Apollo Team Member) Posts 185 07 Dec 2015 12:17	I have done the dnetc benchmark... I had a result of 267,000. Don't know what it means, but it is above to 68060@80 (233,000). Here is the picture: updated with new result


Wawa T Posts 695 07 Dec 2015 14:07	heres some comparison: EXTERNAL LINK i expected a bit better in comparison to 060/50 must admit, but on the other hand being on the level with Pentium-233 is not that bad for starters;)


Claudio Guglielmotti (Apollo Team Member) Posts 185 07 Dec 2015 14:31	we can increase the mhz !

Gregthe Canuck

Posts 274
07 Dec 2015 15:19

Bustest results are the most common I can find. Found two good samples. Both very similar.

Final result is from a Quikpak 060 running busspeedtest.

Summary:
- Cyberstorm 060/50: About 55MB/s maximum on reads, 40MB/s on writes
- Quikpak 060/57: About 57MB/s maximum on reads, 44MB/s on writes

The links below have other benchmark info as well but I snipped out the memory tests.

---------------------------------------------------------------------------------------------

EXTERNAL LINK
Amiga 4000/040 Amiga 4000 with Amiga 4000 with
TEST 25Mhz CYBERSTORM 040/40 CYBERSTORM 060/50

Bustest 0.07 Memory Performance Test
Data Transfer rate into FAST RAM

Read Word 12.7 MBytes/sec 43.9 MBytes/sec 42.6 MBytes/sec
Read Long 13.0 MBytes/sec 52.9 MBytes/sec 55.0 MBytes/sec
Read Multiple 13.0 MBytes/sec 48.5 MBytes/sec 55.2 MBytes/sec
Write Word 7.0 MBytes/sec 31.6 MBytes/sec 37.6 MBytes/sec
Write Long 6.9 MBytes/sec 31.7 MBytes/sec 40.2 MBytes/sec
Write Multiple 7.0 MBytes/sec 32.0 MBytes/sec 38.4 MBytes/sec

---------------------------------------------------------------------------------------------

https://translate.google.ca/translate?hl=en&sl=cs&u EXTERNAL LINK
Transmission speeds of RAM (Bustest 0.07) - All FAST RAM value in megabytes / s:

A4000 CyberStorm
040/25 060/50
Read the Word: 12.7 42.6
Read Long: 13.0 55.0
Read Multiple: 13.0 55.2
Write Word: 7.0 37.0
Write Long 6.9 40.2
Write Multiple: 7.0 38.4

---------------------------------------------------------------------------------------------

http://webcache.googleusercontent.com/search?q=cache:D36B77r0woIJ:www.hd-zone.com/2011/05/quikpak-68060-information/&hl=en&gl=ca&strip=1&vwsrc=0

Amiga Technologies A4000T
Quikpak 68060 card installed
32 meg EDO 60ns local 060 ram
18 meg of motherboard ram (60ns Fast Page Mode)
Picasso 4 Graphics Card with Picasso 96 1.13
1.06 gig seagate medalist hard disk

BusSpeedTest 0.19 (mlelstv) Buffer: 262144 Bytes, Alignment: 32768 (bustest fast rom)

Memtype Cycle Bandwidth
fast readw 44.8 ns 44.6 meg/sec
fast readl 71.1ns 56.6 meg/sec
fast readm 73.1ns 54.7 meg/sec
fast writew 44.8ns 44.7 meg/sec
fast writel 89.8ns 44.6 meg/sec
fast writem 89.0ns 45.0 meg/sec
rom readw 44.8ns 44.7 meg/sec
rom readl 69.9ns 57.2 meg/sec
rom readm 73.0ns 54.8 meg/sec

Ram type: 60ns Fast Page Mode (FPM) vs. 60ns EDO Ram (EDO)

Type FPM Read EDO Read % Change
Chip Ram 4617 kB/s 4617 kB/s 0.0%
Fast Ram 46492 kB/s 54545 kB/s +17.3%
Rom 46586 kB/s 54545 kB/s +17.1%

Type FPM Write EDO Write % Change
Chip Ram 6969 kB/s 6968 kB/s -0.01%
Fast Ram 38927 kB/s 44055 kB/s +13.2%

Type FPM Copy EDO Copy % Change
Chip Ram 2777 kB/s 2777 kB/s 0.0%
Fast Ram 22470 kB/s 26716 kB/s +18.9%

---------------------------------------------------------------------------------------------


Gunnar von Boehn (Apollo Team Member) Posts 6254 07 Dec 2015 16:34	well as a Start we could compare with:


Gregthe Canuck Posts 274 07 Dec 2015 18:10	Thanks for AIBB/MemTest result. 170+ MB/s is pretty good. Just noticed on the Vampire 2 bringup page at the very bottom you have already done a bustest run! Love those 300MB+ writes. Why are the readl/readm flavours not as fast?

Gunnar von Boehn
(Apollo Team Member)
Posts 6254
09 Dec 2015 22:22

gregthe canuck wrote:

Love those 300MB+ writes. Why are the readl/readm flavours not as fast?

Because we have not enabled all features of our memory controller yet. We will enables them all before the final release then read speed will be doubled too.

You can assume to reach over 300 MB both in reading and writing.


Gregthe Canuck Posts 274 10 Dec 2015 14:06	Thanks for info. The updated read speeds should result in a nice performance boost. The result would be roughly 6x the memory I/O performance of an '060. Nice!

posts 13