Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Performance and Benchmark Results!

Sinnlose Performancepage  1 2 

Markus B

Posts 209
27 Oct 2019 06:28


Hi Bernd,

would it help if you provide some accurate test code?


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
27 Oct 2019 20:19


We did it again. :-D

Based on the discussed ideas we improved APOLLO 68080 even more.
APOLLO 68080 got faster on certain memory operations.

Can you spot it?

Yes, some MOVEM operations (writem) got great accelerated (twice as fast)
This not only benefits this memory speed test.
This speedup also benefits every AMIGA program
as MOVEM is used in all AMIGA programs.

"ReadM" will be accelerated soon too.

This results clearly prooves that CPU improvements on APOLLO
are the reason for its superior speed - not bugs in BUSTEST.  :-D

Bernie if you have any question on how APOLLO works internally better understand this. Please feel free to ask.
I will happily answer them.


Samuel Devulder

Posts 248
27 Oct 2019 21:28


Congrats BigGun!
 
Optimizations that benefit any amiga programs without requiring recompilation are really good news.
 
Can't wait to test ReadM and WriteM in decreasing order as well.


Mo Retro

Posts 241
27 Oct 2019 22:27


Wow that's a great leap Gunnar :-)
 
  I have a question regarding the V4 X12 vs your V4 X16 in your first post.
  When comparing the gain from X12 to X16 yesterday, I expected an overall gain of 33,33%. But instead it fluctuates between 29 and 55%.
  Is there an explanation for this variation in gain?
 
  
  68080 X12
  ------------------
  Op      calibe    cycle    Bandwith
  Readw.  12.0      normal  166.7
  Readl.  09.7      normal  412.7
  Readm.  12.8      normal  312.0
  Writew. 16.1      normal  124.6
  Writel. 07.7      normal  521.2
  Writem. 15.4      normal  260.2
 
 
  68080 X16 Gunnar
  -------------------------------
  Op      calibe    cycle    Bandwith  X16 vs X12 33.33%
  Readw.  09.7      normal  216.0    29.56%
  Readl.  07.7      normal  544.9    32.03%
  Readm.  09.9      normal  403.2    29.23%
  Writew.  10.4      normal  193.1    54.97% 
  Writel.  05.2      normal  769.7    47.67%
  Writem.  10.4      normal  385.3    48.70%



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
28 Oct 2019 08:32


Mo Retro wrote:

Is there an explanation for this variation in gain?

There is some logic behind this.

a) A CPU has a theoretical maximum speed.
Which depends on the CPU clockrate, CPU buswidth, and the amount of cycles a transactions take for the CPU.

Some example:
Bus  = 32bit (4 Byte)
Clock = 50 MHz
Cycle = 2
Max-Result = 100 MB/sec

Example APOLLO 68080 (x16)
Bus  = 64bit (8 Byte)
Clock = 114 MHz
Cycle = 1
Max-Result = 912 MB/sec

You see on the above BUSTEST results clearly the effect of BUSWIDTH.
WORD access = 2 Byte = results around 200 MB/sec
LONG access = 4 Byte = results around 400 MB/sec
64BIT BUS access = results around 800 MB/sec

The benefit and advantage of the 64bit bus is clearly visible.

So on the one hand side you have the maximum a CPU could do.
On the other hand you have the maximum the memory controller and memory chips can do.

And then you have the Memory Access  Latency.
For READs you have always a Read Latecny.
Lets say this Latency is 10 Cycle.
Lets make an example:

Bus  = 32bit (4 Byte)
Clock = 50 MHz
Cycle = 2+10 Latency
Max-Result = 16 MB/sec

APOLLO 68080 (x16)
Bus  = 64bit (8 Byte)
Clock = 114 MHz
Cycle = 1+10 Latency
Max-Result = 82 MB/sec

You see on the above BUSTEST results clearly that WRITE is pretty fast - as writes are posted and are not affected by latency.
READs on the other hand are affected by latency.
This is the reason why READ speed is less than WRITE speed.

But you see that READ performance is very very good.
Much more than we assumed it should be.
Why is APOLLO Read Speed so good?

The reason is that APOLLO automatically compensate the READ Latency by doing by itself prefetching.
APOLLO is the only 68K CPU able to do this.



Markus B

Posts 209
28 Oct 2019 09:48


Good work, congrats, Gunnar!

Your explanation makes sense and the measurement seems reasonable.

Just curious: Is this approach on par with current AMD64 CPUs? If I interpret information on a Ryzen CPU correctly, the maximal throughput is 22 GBps at 2933 memory clock. So scaled down to 100 MHz it hits something like 730 MBps, if I get it right.

Would there been something possible with dual-channel access to the DDR3 memory?


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
28 Oct 2019 12:17


Markus B wrote:

Good work, congrats, Gunnar!

Thank you!
 

Markus B wrote:

Would there been something possible with dual-channel access to the DDR3 memory?

You mean work om improving the performance more?

I think in comparison with other systems we are very good.

Compared to other 68K systems the Vamp is stellar.
Other 68K system are very good when they reach 20 MB/sec and excellent when they reach 30MB/sec.

Vampire reach 300-700 MB/sec so whole magnitude faster.

Even compared to PPC System the Vampire stand its ground excellently.
The V4 beats Pegasos, AmigaOne and SAM in these tests.
So even Gigaherz PPC scores less than Vamp here.

Of course you are right, it does not hurt to think about ideas to make it even faster.


Nixus Minimax

Posts 416
28 Oct 2019 13:54


Gunnar von Boehn wrote:

We did it again. :-D

Wow, that was quick! When I wrote about how merging MOVEM-accesses would be more difficult, I somehow sensed that you were going to prove it isn't too difficult for you! Congratulations! I really think that is a major optimisation as it speeds up all subroutine calls.


Markus B

Posts 209
28 Oct 2019 16:52


Is it that "easy" to think of a dual-channel controller in a way that it accesses two independent chips at the same time (like a RAID-0)?
But as there is only one DDR3 chip on the V4 ...


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
28 Oct 2019 17:32


Markus B wrote:

  Is it that "easy" to think of a dual-channel controller in a way that it accesses two independent chips at the same time (like a RAID-0)?
  But as there is only one DDR3 chip on the V4 ...
 

 
Yes we could in the future make Vampire Cards with wider memory buses on the board.
This is basically what you want, as this will allow to improve system memory throughput.
But making new cards is not planned today.
We have the V4 now and are pretty happy with it.
 
BTW how do we compare to other systems of the "era".
E.g. Does anyone know AMD K6 max memory read speed?
Or best speed of AMD K6-2?
Or other chips of the good times?

Here is an interesting paper, with results from others parts of the world.
EXTERNAL LINK


Markus B

Posts 209
28 Oct 2019 18:19


With my primitive calculation on current DDR4 systems and scaling down to 100 MHz, it became already clear that the AC68080 is on par with these systems.

In the AC68080, is the memory always driven by the same clock speed as the CPU itself? How do other memory controllers handle different clock speeds between the CPU and the memory? I lack imagination how the data is transferred via

Memory -> Controller -> CPU


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
28 Oct 2019 18:33


Markus B wrote:

In the AC68080, is the memory always driven by the same clock speed as the CPU itself?

No they use different speed.

The Memory/Data-transfer rate on the V2 is twice the CPU clock.
The Buswidth on the V2 is 32bit.
This means from a CPU perspective the memory is 64bit wide.

The Memory/Data-transfer rate on the V4 is roughly eight times the CPU clock. The Buswidth on the V6 is 16bit.
This means from a CPU perspective the memory is 64bit wide + plus 64bit margin for Chipset access.



Mo Retro

Posts 241
28 Oct 2019 22:52


Thanks for the explanation Gunnar.
I got the idea now, how it works.
Great work :-)


Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
29 Oct 2019 06:53


Gunnar von Boehn wrote:
 
  Of course you are right, it does not hurt to think about ideas to make it even faster.

Next gen V6 Vamp could have 2x 256MB DDR4 and dual channel :)

Kudos on really great mem performance, once in past it proven to
beat 1.8Ghz PA6T which had nice DDR2 mem. controller,I believe faster then rest of NG gang.

However, PA Semi taught me fast memory access is great, but no compensation for slow CPU.

Thus, hope to see x16 core for V4 reaching almost 115 080 Mhz also :)




Gunnar von Boehn
(Apollo Team Member)
Posts 6207
29 Oct 2019 10:24


Lets look at the STANFORD University Memory speed comparison
with machines which were just "awesome" at the time after Commodore died.

 
ALPHA 533 MHz was in the era an unbelievable giant.
Finally its the time for AMIGA to rule them all.
AMIGA with 68080 and Vampire does smoke it now :-)
 
 
 


Amiga 4Life

Posts 101
29 Oct 2019 23:20


The Memory/Data-transfer rate on the V4 is roughly eight times the CPU clock.
 
  Thank you...


Cyprian K

Posts 26
30 Oct 2019 20:56


Gunnar von Boehn wrote:
Yes, some MOVEM operations (writem) got great accelerated (twice as fast)
This not only benefits this memory speed test.
This speedup also benefits every AMIGA program
as MOVEM is used in all AMIGA programs.

"ReadM" will be accelerated soon too.

great job!



Teemu Kärkkäinen

Posts 3
31 Oct 2019 05:31


Anyone tried 68080's performance with the Distributed.net client yet? I'd be happy to know how it compares there against other 68k systems.

EXTERNAL LINK 
thanks :)

posts 38page  1 2