Overview Features Instructions Performance Forum Downloads Products Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
VISIT APOLLO IRC CHANNEL



All TopicsNewsPerformanceGamesDemosApolloVampireCoffinReleasesLogin
Performance and Benchmark Results!

X86 Power !page  1 2 3 4 5 6 7 8 9 10 11 

Vojin Vidanovic

Posts 1371
02 Nov 2019 08:32


Jim Drew wrote:

  BTW, I did a AMMX optimized version of PCx. 
 
  I might be able to make the Amiga video driver faster for chunky-2-planar support. I have no interest in hires modes for the Amiga, I still use 640x200 16 color for all of my Amiga setups.  The PC has MODE13/X which is 320x200x8.  That mode I already support promoted to a video card.
 

 
  Please do so. I would pay a small fee to have more native PC Task then slow DOS Box. Vamp can do better then slow 486. Video optimizations are crucial, if we could use RTG :)
 


Gunnar von Boehn
(Apollo Team Member)
Posts 4126
02 Nov 2019 09:45


In theory the x86 emulation could be much faster.

PCx today does "interprete" each instruction.
PCx does some tricks to speed this interpreting up -
but its not on the level of a real JIT compiler.

So there is a lot room for improvement and a JIT compiled x86 execution could be several times faster.




Jim Drew
Learn who I am!
Posts 63
02 Nov 2019 20:17


PCx today does interpret each instruction, but it also has CPU Transcription (which is what JIT is).  With CPU transcription enabled blocks of x86 code are converted to 68K code.  The first time the x86 code is encountered the speed is slower of course, but repetitive calls to the same code don't require interpretation at that point.  This can make the code many times faster than the interpretive method.



Jim Drew
Learn who I am!
Posts 63
03 Nov 2019 04:52


Here is a good video showing the CPU transcription mode in action - compared against the Dynamic version of PC-Task.  PCx takes about 6 seconds to load Windows 3.1.  PC-Task takes 25 seconds.
 
  There is also a comparison of FUSION vs. Shapeshifter.  You can easily determine who has the faster products.  :)
 
  EXTERNAL LINK   


Gunnar von Boehn
(Apollo Team Member)
Posts 4126
03 Nov 2019 08:12


Jim Drew wrote:

  PCx today does interpret each instruction, but it also has CPU Transcription (which is what JIT is).  With CPU transcription enabled blocks of x86 code are converted to 68K code.  The first time the x86 code is encountered the speed is slower of course, but repetitive calls to the same code don't require interpretation at that point.  This can make the code many times faster than the interpretive method.
 

 
OK maybe you can help me understand this again.
 
I think ideally if you have on x86 side this

  ADD
  ADD
  SUB

 
This could should optimally translate to

  ADD
  ADD
  SUB

 
I remember PCx that it did roughly something like

  move (pc)++,D0
  jsr  (routines,D0)
  move (pc)++,D0
  jsr  (routines,D0)
  move (pc)++,D0
  jsr  (routines,D0)
 
  --
routines:
ADD:
    some code to load regs
    add
    some code to update regs and flags
    rts
SUB:
    some code to load regs
    sub
    some code to update regs and flags
    rts
 

 
I recall here that there is a lot overhead per instruction.
Did I recall this correctly?

The indirect JSR is very costly - it alone cost like 10 instructions .
 
 
If I remember this correctly, then would this not mean that PCx could be like 5-10 times faster in theory with a different tuning here?

I think there is a lot potential to speed this up
so much that it could be several times faster.


Jim Drew
Learn who I am!
Posts 63
04 Nov 2019 00:42


The CPU transcription routine copies the code that normally gets executed for each interpreted instruction into a block of memory sequentially.  Once a block of code is transcribed the entry point of the block of x86 code is replaced with an illegal instruction (trigger).  That trigger is followed by the address in the memory block.  The CPU transcription stops when a x86 return (RTS) instruction is executed, so entire sections of x86 code a transcribed all at once (while running).  This works well with small blocks of code, and since the x86 generally has things in 64K segments it's almost automatic that blocks will be small.  There are some cases though where code is longer than the transcription buffer.

CPU Transcribed code sections are dramatically faster - like 10x the speed of plain interpreted code.

The CPU Transcription is a good trade off for performance and keeping the code/buffer small as possible.  Amiga's back in the day were limited in memory compared to today.  I could improve things quite a bit by unrolling the code that gets generated.



Gunnar von Boehn
(Apollo Team Member)
Posts 4126
04 Nov 2019 07:42


Jim Drew wrote:

I could improve things quite a bit by unrolling the code that gets generated.

 
I see that you acknowledge the possibility to make PCx a lot faster.
I think this is good!
 
I wonder what you think about PCX yourself.
Is PCx as software which worth to get supported, improved, and further developed, or not?

If you say YES, would it then not make sense to develop some updates?


Jim Drew
Learn who I am!
Posts 63
04 Nov 2019 17:29


It's already converting x86 code into 68K code so it won't get that much faster by unrolling it, but it would improve the speed a bit (not 2x or more though I don't think).

I have already released a beta version to your team a few years ago.

I will be releasing updated versions of FUSION and PCx as a package, with new video drivers, ISO support, and other features.




Gunnar von Boehn
(Apollo Team Member)
Posts 4126
04 Nov 2019 18:40


Jim Drew wrote:

It's already converting x86 code into 68K code so it won't get that much faster by unrolling it, but it would improve the speed a bit (not 2x or more though I don't think).
 
I have already released a beta version to your team a few years ago.

Yes, I recall this very well.
I've reviewed your PCx Beta version with you. :-D

What I recall about this PCx Beta
is exactly why I talk to you about making it much faster.

I recall your "transcription" as you call it,
had the overhead of a calculated "JSR" per opcode.
 
I believe that a significant speedup like 5 times speed up
with "real JIT" could be a reasonable goal.

A real JIT could avoid all these calculated JMPs - they are slow.
A real JIT could also remove other overhead like Register /Reading/Saving and also avoid unneeded Flag generation.

Is this needed?

I think with tuned x86 emulation combined with native PC GFX-modes,
a lot more could be run smooth on AMIGA - even stuff like PC games like "Tomb Raider"


Vojin Vidanovic

Posts 1371
04 Nov 2019 21:02


Gunnar von Boehn wrote:

    Yes, I recall this very well.
    I've reviewed your PCx Beta version with you. :-D
   
    a lot more could be run smooth on AMIGA - even stuff like PC games like "Tomb Raider"
   

   
    Hope 4.5 Beta will go public, shareware, paid ... whatever :)
   
    And I am glad development continues.
    Add SB16 or GUS support please.
   
    Hope to see that Tomb Raider one day ;)
 
 
Jim Drew wrote:

    I will be releasing updated versions of FUSION and PCx as a package, with new video drivers, ISO support, and other features.

 
  When can we expect this and what will be the price?
 
  I see it has been prepared since 2013
  EXTERNAL LINK 
  Your website mentions EMPLANT? Is it this emulator
  PDF manual
  EXTERNAL LINK  Emplant 421 adf
EXTERNAL LINK


Michael R

Posts 272
05 Nov 2019 00:17


With some optimization and soundblaster you could play such clssics as the dos version of Sid Meiers Pirates!


Andy Hearn

Posts 254
05 Nov 2019 09:50


Jim Drew wrote:

  I will be releasing updated versions of FUSION and PCx as a package, with new video drivers, ISO support, and other features.
 

awesome! can't wait! :)


Mr Niding

Posts 381
05 Nov 2019 10:04


Jim Drew wrote:

It's already converting x86 code into 68K code so it won't get that much faster by unrolling it, but it would improve the speed a bit (not 2x or more though I don't think).
 
  I have already released a beta version to your team a few years ago.
 
  I will be releasing updated versions of FUSION and PCx as a package, with new video drivers, ISO support, and other features.
 
 

Great to see you here, and I cheer on your developments with regards to PCx and Fusion.
I will purchase the Standalone once its released, and Im steadily reducing my Wintel exposiure in my reallife workflow (increasingly using Linux on AMD chips).
I realise that for most things, the Amiga enviroment is a hobby, but for many things that doesnt require intense processing power; you can get by using AOS 3.x or hopefully AROS down the road.

A wellconfigured Fusion that can reach into the pool of 68k applications could increase the usability of spreadsheets and wordprocessing.

Thanks again!


Jim Drew
Learn who I am!
Posts 63
05 Nov 2019 21:58


Gunnar von Boehn wrote:
 
  What I recall about this PCx Beta
  is exactly why I talk to you about making it much faster.
 
  I recall your "transcription" as you call it,
  had the overhead of a calculated "JSR" per opcode.

I am not sure what you are talking about here.  Entire blocks of code are converted sequentially as instructions are decoded and ran.  There is not JSRs per opcode.  Maybe you are looking at PCTask's Dynamic version?

The CPU Transcription literally copies the exact code that does the interpretive instruction decoding for each instruction into the buffer.  It makes no attempt to optimize anything, it just eliminates the instruction and EA table lookups so the code runs without those penalties, which as you note are severe.  It's the reason why the CPU Transcription is significantly faster, especially with 32 bit operations.  I have no interest in making a real "JIT" compiler type of setup because that would require a huge amount of memory.  We have PC's as common place items today for those that want to run PC programs.  PCx is still good for old programs that require DOS5/6 and CGA/VGA type graphics and basic Sound Blaster support in order to even work.




Jim Drew
Learn who I am!
Posts 63
05 Nov 2019 22:01


Vojin Vidanovic wrote:

    When can we expect this and what will be the price?

Before the end of the year. I am thinking the price for the package will be around $20.

Vojin Vidanovic wrote:

  Your website mentions EMPLANT? Is it this emulator
    PDF manual
  EXTERNAL LINK  Emplant 421 adf
  EXTERNAL LINK 

Yes, I created the EMPLANT board and emulations for it.




Gunnar von Boehn
(Apollo Team Member)
Posts 4126
06 Nov 2019 11:12


Jim Drew wrote:

The CPU Transcription literally copies the exact code that does the interpretive instruction decoding for each instruction into the buffer.
It makes no attempt to optimize anything, it just eliminates the instruction and EA table lookups so the code runs without those penalties, which as you note are severe.  It's the reason why the CPU Transcription is significantly faster, especially with 32 bit operations.  I have no interest in making a real "JIT" compiler type of setup because that would require a huge amount of memory. 

Maybe we need real OPCODE examples here to prevent misunderstanding?

You explained:
Jim Drew wrote:

The CPU Transcription literally copies the exact code that does the interpretive instruction decoding for each instruction into the buffer.

I can understand this.

What you describe adds an extra overhead step, needs extra buffer,
and also gets a tuning benefit.
So I see the point - but its only halve way to what could be done.

If you look at goals like Games "DIABLO" or "WINDOWS 95" or "AGE OF EMPIRES" or "COMMAND and CONQUER" then more performance will be beneficial.

So I see a clear benefit for the people if the performance could be doubled, or tripled, or quadrupled



Jim Drew
Learn who I am!
Posts 63
08 Nov 2019 07:14


Yes, the speed could be increased with a huge amount of work, but what is the point?  Modern PC hardware is cheap and runs circles around any emulation.  Even my Mac emulation for the PC (FUSION-PC) running on my 4930K i7 CPU is >14 times faster than the GOLD Apollo core running FUSION.  So, if I need a 68K Mac emulation that is the route I go.  It's actually quite funny to watch the Mac boot in under 5 seconds.
 
I guess there are those that will always want to run the Amiga, which is great.  I still love the Amiga.  But it seems a bit silly to run a PC emulator in these modern times.  Even things like ao486 are far better/faster, and run on even tiny/cheap FPGA boards.  You would be better off (I think) making a version ao486 available for your boards like you do with the Atari ST emulator.

For the purist I will still be releasing new versions of my software.



Vojin Vidanovic

Posts 1371
08 Nov 2019 08:18


Jim Drew wrote:

  Yes, the speed could be increased with a huge amount of work, but what is the point?  Modern PC hardware is cheap and runs circles around any emulation.  Even
 

 
  I have my AMD x64 but I do want to run alchemy, frontier Fe and some Nice Dos games on v4.
 
  If you would code as you sweettalk win2000 might run libre3 or word 2000.
 
  Sweettalk Has Here (PCx) costed you a customer, but interest remains for opt fusion
 
  FreeMint runs with no core change thanks to emuTos 080 support


Gunnar von Boehn
(Apollo Team Member)
Posts 4126
08 Nov 2019 08:28


Jim Drew wrote:

Yes, the speed could be increased with a huge amount of work, but what is the point?
    ...
But it seems a bit silly to run a PC emulator in these modern times.

   
 
Thanks for the honest answer.
   
Let me repeat what your said in my words:
Your real goal is to sell your old product under a new name.
   
You have to make a new version name,
as you sold to license of the original version to someone else.
Therefore the version needs be "new version" to allow you to sell it.
   
   
But your real goal is _NOT_ to produce a real "new version" to really improve something for the users.
Your goal is just to underrun the license deal that you made.
   
In other words you want to give the 20 years old software a new name, to sell the old product to new customers.
This way you cheat your old license partner
and your new customers will get old software under a new name.
 
Did I get this right?


Richard Gatineau

Posts 54
08 Nov 2019 08:36


Jim Drew wrote:
Modern PC hardware is cheap and runs circles around any emulation.  Even my Mac emulation for the PC (FUSION-PC) running on my 4930K i7 CPU is >14 times faster than the GOLD Apollo core running FUSION.

       
14 times only? So, it is a nice idea to improve the 68080 to be a great 68x86 CPU !!!
(yeah, it can be read in both directions)

posts 202page  1 2 3 4 5 6 7 8 9 10 11