Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

"Fast Blitter Operations"page  1 2 3 4 

Manfred Bergmann

Posts 226
19 Apr 2020 14:09


Olle Haerstedt wrote:

    The only programming language that easily scales to any number of processors is Erlang, because it uses message passing as a main abstraction, and this language is considered "esoteric".
 
 

 
  Actually it got a new live today, not only because it can deal nicely with multi-core and threads, but also because of Elixir language which is built on the Erlang VM.
  This language is a lot more attactive than Erlang itself.
 
  The important thing on Erlang, which is already 40 years old.
  The key point of the message passing is that only one thread at a time can change state.
  The other key point is immutable data structure and the fact that in Erlang you can only assign a variable once.
 
 
 
Gunnar wrote:

  For the software developer a CPU which is high performance and is able to execute many instructions fast is magnitude easier to handle then a bunch of simple CPUs which each is slow but all together have the same speed.
 

 
  Indeed. But sometimes a software developer wants to have access to threads, for example to not lock up the UI up while some operation is still running that the user triggered.
 
  The other problem is that many frameworks and libraries were designed and developed in the one CPU area and hence are not thread-safe, etc.
  Like all the old Java collections (yes, yes, there are now thread-safe collections, but still based on locking).
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
19 Apr 2020 14:15


Samuel Crow wrote:

  @Gunnar
    Wasn't the reason you wanted a software blitter on the 68080 core in the first place that you wanted it to have AMMX acceleration?  What happened?  Did that plan fall through?

I'm not sure I understand your question?
Please repost more clear.

Let me try to answer: 
Yes, Apollo 68080 has AMMX acceleration.
 
AMMX provides excellent performance and also supports multisource 64bit Blitter operations in single cycle.
  The AMMX instructions can reach hundred times more speed than AMIGA Blitter.
 
 
 
Samuel Crow wrote:

  Also, if the memory bus can be saturated by the CPU, that means you need faster memory and a controller to go with it

Yes you are right the goal for performance is fast memory.
I agree and this is why the memory controllers used on the Vampire are by far the fastest memory controllers ever available on Amiga.
Today AMMX can saturate the memory.
If AMMX could not wouldn't then the simplest method being using faster AMMX code?
 
Let us remind that:
  - Coding is AMMX is very flexible (which a Blitter is not)
  - Coding in AMMX allows using of cache (which a Blitter does not)
  - Coding in AMMX is multitasking and OS friendly (which Blitter is not)
  - Coding in AMMX is also SMP friendly (which Blitter is not)
 
AMMX is high performance, very easy and flexible to code.
Why would you want something more complex which gives problems using?

I think whoever coded for MultiCore system or used Amiga Blitter knows what I mean.
 
Samuel Crow wrote:

    It just means the v2 boards with 128 bit SDRAM are reaching the peak of their useful life and that the v4 boards can shine brighter with a slave processor.
 

You make it sound like V2 memory is slow?
This is not the case V2 memory is very fast.

And yes AMMX can saturate both V2 and V4 memory.
 
 
 
Of course one can make an argument and say:
The Blitter in AMIGA is slow because it uses limited amount of Logic and has no caches and is flexible because is not programmable.
And all this is true.
 
One could argue now - That be inventing a very complicated, and big Blitter with Caches and Program interface one could increase speed.
And this is of course true.
But FPGA space cost $$$
This means a hyper complex Blitter like this would require a much more expensive FPGA.
 
So one can now argue:
In theory there would be the option to increase Vampire price by $100 to have plenty "FPGA-space" for a "Super-Blitter", which could then be even faster than todays high speed AMMX code.
But this would of course also make coding much more complicated, with all the problems regarding Multitasking as mentioned.
 
I'm personally doubt that this the most sensible
as today AMMX is super fast, and gives you really an AMIGA faster than ever - while being at the same time easy to code and multitasking friendly - without making the AMIGA/Vampire more costly.

What do you think?
 
 
 


Vojin Vidanovic
(Needs Verification)
Posts 1916/ 1
19 Apr 2020 15:42


In short:
Blitter is for legacy blitting 2D operations only.
Its same capacity as AGA one, "and not en par with 080".

This makes CPU Blitting, sprite handling and AMMX as recommended for newer software. We can archieve more sprites using those.

Well, having improved Blitter would be nice, but my personal favourite for remaining v4 space would be warp3d compatibility in hardware. Amiga needs to move from fast 2D to at early 3D, retaining nice 2D abilities.


Samuel Crow

Posts 424
19 Apr 2020 20:28


@Gunnar
 
  Back when I was on the team you were running out of space on the FPGA so you thought of using a software blitter so it could take advantage of AMMX.  I suggested making the 68080 dual-threaded to exploit the parallelism of the blitter so the blitter would be almost as fast as the main thread of the CPU when the timing wasn't being emulated.  That's what I was talking about in the first paragraph of my post.
 
  Secondly, there were two memory busses in an Amiga to double the memory bandwidth.  If one is saturated, add real fast RAM and the bandwidth doubles.  It's not so simple in the Vampire because there aren't enough pins on the FPGA to do that.  Also all memory on the Vampire is essentially chip RAM whether the OS sees it that way or not.
 
  What I'm trying to do is keep both busses saturated at once for better overall capability if possible.  Current models don't support that.  I'm trying to think of tomorrow.


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
19 Apr 2020 23:44


Hi Sam
 
  Maybe we misunderstand us here.
 
 
Samuel Crow wrote:

  so you thought of using a software blitter so it could take advantage of AMMX.
 

Yes it uses AMMX.
   
 
 
Samuel Crow wrote:

  I'm trying to think of tomorrow.
 

This is nice. But maybe we should live in the today?
 
Today the Vampire is the fastest Amiga ever.
The Vampires reaches memory speed 10 times faster than 68060 accelerators.
 
Why waste time on options that an Vampire 5 in 100 years might have?
Use the Vampire and code for it today.
You can do with it things that no Amiga could ever do before.
 
 


Ray Couzens

Posts 93
20 Apr 2020 10:55


wrote:

   
Samuel Crow wrote:

    I'm trying to think of tomorrow.
   

  This is nice. But maybe we should live in the today?
   
 

 
  I like that people are excited enough to look to the future of the Vampire.  That is good and keeps the Amiga spirit alive.
 
  However, I think now that we have a very good Vampire Amiga, better at the moment would be to improve AROS and develop software taking advantage of V4's capabilities.


Dma Con

Posts 16
20 Apr 2020 21:03


It's the old question: what is the Amiga platform and how to revive it order to maintain its "identity".
 
  I suppose everyone here has his/her own definition of what is the essence of this identity. This is often tied to each one's personal memories & experiences.
 
  Let's play the devil's advocate, and argue a little bit for the design decisions which the original team made. ;-)
 
 
Gunnar von Boehn wrote:

  A major part of the "beauty" of the Amiga was using the very coder friendly 68K CPU.
  Because of the elegant to code 68K CPU the Amiga had such a big coder scene.
 

 
If it is a major part, then there actually was never a big difference between an Atari ST and an Amiga to begin with.

It also runs contrary to the fact that the biggest coder scene at the time was still highly active on a computer with one of the most basic CPUs and operating system one could imagine (C64).
 
I would argue that main part of the Amiga's beautiful concept is the clean functional separation of dedicated tasks on dedicated hardware acceleration units.
 
  The 68000 is actually the only, off-the shelf "generic" part in the Amiga, already used by other platforms.
 
 
Gunnar von Boehn wrote:

  This means the Amiga Blitter is a compromise.
  It improves bitoperation speed to be comparable to 68020 Level but it also introduces drawbacks for Multitasking and deadlock problems.
 

 
  The original developers sort of realized this issue, which is why the copper can be used as a command processor font-end for the blitter.
  You could easily create a more complex blitter "drawing list", bookend it with a write to INTREQ to signal, e.g., the end of blits for ALL content of the current screen.
  Demos like "Hardwired" make great use of this feature, drawing lots of flat-shaded polygons with butter smooth speed.
 
  It only has one flaw though... The Copper has still the responsibility to control the free-running bitplane DMA pointers. Therefore, this method only works if all your blits have finished within a frame.
 
  So as much as the step from "ANTIC" to "AGNUS" as to introduce programmability to the display front-end, the next logical step would have been to extent this concept for the blitter, eventually expanding its capabilities.
 
  The only "compromise" came when the Chipset architecture did not evolve uniformly across all functions.
  16-bit access, no page burst mode, long access cycles...
 
 
Gunnar von Boehn wrote:

  Also you need to mind that the Blitter is slower than a 68030 CPU for many tasks - this means many GFX operations like printing TEXT
  will run slower if you use the Blitter instead the CPU.

 
  When the Amiga designers made the original concept, they were not thinking about how to efficiently draw 8x8 fonts.
  They were thinking about how to draw large, animated cartoon characters in front of drawn backgrounds.
  How to do blazingly fast 3D graphics with the line and XOR fill mode.
 
  I remember back in the day, when of my PC friends was arguing against the Amiga: "but it does not have a text mode..."
 
 
Gunnar von Boehn wrote:

  Also when doing a BLijob you have choices as coder.
  For example when doing a typical 5 plane(32color) Bob blit.
  This means "normally" 5 Blitjobs per BOB.
  The amount of overhead is very high here.
 

 
  This is one reason why the designers kept the Sprites for smaller objects.
 
 
Gunnar von Boehn wrote:

  You can lower the overhead to 1 job, if you create a 5 times bigger MASK for the BOB in memory - but this wastes precious chipmem.
 

 
  And yet, the most popular Amiga games (Turrican 1,2 & 3) still run on 512K machines. Using it for code, sound and graphics.
 
 
Gunnar von Boehn wrote:

  A lot more flexible than the Blitter is the CPU.
  If you want in the future help the CPU, then the most flexible, the most powerful, and multitasking friendly solution is to add a 2nd 68080.
 
  But you need to mind one thing:
  As more components you add to coding as more complex it gets.
  A game "just" using the CPU is a lot easier coded than a game using 2 CPUs and a lot easier than using the blitter.
 

 
  Yes, but is "easier" a generally valid argument against dedicated HW acceleration?
 
  Why would I want to closely tie the speed of individual pixel drawing operations to the speed / architecture of the CPU core?
 
  The choice you had to make is the choice which works best in a constrained FPGA environment.
  You can't have a sophisticated CPU and a sophisticated graphics HW accelerator, with dedicated memory buses, in an FPGA of this price range.
  But I think the Amiga "spirit" is all about decoupling those functions.
 
  If you would not believe this, then why did you choose to add 4 more audio channels (software audio mixing should be a pretty basic task for a CPU of the Apollo core range) or enhance the HW sprite functionality in your chipset implementation, despite the CPU drawing speed? ;-)
 
  Of course you have a different perspective as the CPU architect. Your CPU is the special part. And your CPU is the reason why I have 2 Vampire cards running in my Amigas. ;-)


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
20 Apr 2020 21:35


DMA CON wrote:

But I think the Amiga "spirit" is all about decoupling those functions.

I fully agree here.

To my the Amiga architecture has a few very nice features:

1) Four DMA driven Audio channels, which non fixed samples rates.
SAGA increases those to 8, with adding 16bit sample support, and free L-R panning.  (These features are actually also what Commodore planned to do in AAA chipset)

2) The Copper is IMHO excellent for controlling the display.

3) Nice Amiga features are the dual-playfields.
Dualplayfield is a nice for 2D games.
Of course one can try to think what would have been possible if the playfields would had beeing even stronger.?
AAA chipset planned the multibit mode per plane.
I think with such design one could create more playfields with more colors. This would enable nice stuff.

Regarding the Blitter, yes it can surely be useful in many ways.
On the other hand its a trade-off.
The Blitter never works independant, it always needs the CPU to set it up. And while you can use the Copper to set it up. This is pretty complex and as you correctly explained conflict with the main task of the Copper. And it can not be denied to admit that Blitter coding has many risk. As we all know many games/demos miss WAITBLIT and make other mistakes.
Blitter clearly is a two-sided coin.

DMA CON wrote:

The choice you had to make is the choice which works best in a constrained FPGA environment.

The same argument is also valid for any ASIC.
As you might know INTEL and other companies today argue very similar as me.


Stefan "Bebbo" Franke

Posts 139
20 Apr 2020 23:42



The Blitter never works independant, it always needs the CPU to set it up.

heh! the darn cpu also needs the cpu to set itself up...
... get rid of the cpu too^^ :-)

SCNR


Vladimir Repcak

Posts 359
21 Apr 2020 05:02


Olle Haerstedt wrote:

Gunnar von Boehn wrote:

  A lot more flexible than the Blitter is the CPU.
  If you want in the future help the CPU, then the most flexible, tzhe most powerful, and multitasking friendly solution is to add a 2nd 68080.
 

  Good point. Or a second 68000! Just kidding. ;)

Actually, for 3D games, a second 68000 totally makes sense.

The performance difference between 3D rendering and rest of the game is, like, 100:1 ( I have detailed benchmarks from my Jaguar somewhere around).

So, while 68080 would be busy, in a tight loop, rendering frame after frame, after frame, the second CPU : 68000 could be managing:
input
audio
AI
game state
syncing
HUD
Physics
Collision Detection
Camera
World Culling
etc.

You see, on Jaguar, I was refactoring a lot and found out the hard way the following:
1. RISC GPU running a tight 3D transform/render loop with Blitter
2. 13.3 MHz 68000 only needed 10% of a frame time to process EVERYTHING else (like 20,000 pages of ASM code)

10% of frame time at 60 fps. This means you still had 90% of 68000 performance available. At 60 fps. Of course, it was hand-optimized ASM, but still...

There's a reason why all arcade boards in '90s had few fast-clocked RISC chips (for rendering) and few slower CPUs (like, 68000 or even 6502 for game logic or audio).

So, if there are enough gates available on FPGA (and it was feasible from implementation standpoint), I would totally appreciate even if the second CPU was just low-clocked 68000...



Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Apr 2020 07:18


Vladimir Repcak wrote:

  2. 13.3 MHz 68000 only needed 10% of a frame time to process EVERYTHING else (like 20,000 pages of ASM code)
 
So, if there are enough gates available on FPGA (and it was feasible from implementation standpoint), I would totally appreciate even if the second CPU was just low-clocked 68000...

 

68080 is ~ 300 times faster than 68000@7MHz
 
So lets do the math:
 
If you say that you need 10% of the 68000@13MHz
Then this means you need 0.06% of the 68080 CPU.
 
 
Vladimir Repcak wrote:

So, if there are enough gates available on FPGA (and it was feasible from implementation standpoint), I would totally appreciate even if the second CPU was just low-clocked 68000...

Adding an extra low clocked CPU just to save 0.06% CPU time sounds like not a good purchase to me.

We have also to mind that the 68000 lacks a lot which the 68080 supports.
The 68000 will can crash on misaligned memory access.
The 68000 lacks the 020 EA modes.
And of course it lacks the new instructions and AMMX of the 080.

This means the coder can NOT such take a routine from 080 and run it on 000 and will have to develop code twice.
I'm not sure this will make coders very happy.

A lot more sensible would be to exploit HYPERTHREADING on the 68080.


Vladimir Repcak

Posts 359
21 Apr 2020 07:46


Yeah, given that I keep all the look up tables from the Jaguar (easy as I have 128 MB RAM and not just 2 MB like on Jag), I don't expect the non-rendering code to take more than 1% of frame time on 68080.

Not yet sure if Audio impact on available bandwidth.

Hypothetically speaking, if you were to add a second 68080 (I would love that obviously), how much more expensive Altera board would have to be bought? I mean current prices of fpga boards.

And how much effort on the team side to have a dual core working? Is there a potential for this next year?


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Apr 2020 08:22


Vladimir Repcak wrote:

Hypothetically speaking, if you were to add a second 68080 (I would love that obviously), how much more expensive Altera board would have to be bought? I mean current prices of fpga boards.
 
  And how much effort on the team side to have a dual core working? Is there a potential for this next year?

APOLLO 68080 is designed for HYPERTHREAD support.
Hyperthread means to run 2 programs in parallel on 1 CPU core.
This is a very cost effective way to "use" parts of the CPU which one programs does not use in parallel by another program.

HYPERTHREADING will NOT double your CPU power but its relative low cost in HW and will for certain applications will give a great boost.



Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Apr 2020 08:43


Stefan "Bebbo" Franke wrote:


  The Blitter never works independant, it always needs the CPU to set it up.
 

 
  heh! the darn cpu also needs the cpu to set itself up...
  ... get rid of the cpu too^^ :-)
 
  SCNR

People that actually used the Blitter on AMIGA will know that the Blitter needs quite some extra stuff.
Like for example WAITBLIT() or ALLOCATIONS/OWNING of it - which you all not need for a CPU solution.



Vladimir Repcak

Posts 359
21 Apr 2020 08:43


So, which physical resources of 080 would be duplicated for the second thread here?

Execution Units? Meaning four execution Units? Two for thread 1 and two for thread 2?




Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Apr 2020 09:11


What I LOVE so much about AMIGA is that it makes great sense.
If I look at the Chipset and design decision inside the Amiga then I can see that it was designed by very smart people making lots of rational choices.

For sure if memory would have been cheaper and 68030 would have cost less, then the AMIGA 1000 would have been designed with
8MB Chipmem , 64 MB Fastmem and a 68030 CPU.

But they are not IBM building computer for the US military - where it does not matter if the computer cost a million USD more.

They made rational design decision which allowed all people to effort one.



Manfred Bergmann

Posts 226
21 Apr 2020 10:18


Gunnar von Boehn wrote:

HYPERTHREADING will NOT double your CPU power but its relative low cost in HW and will for certain applications will give a great boost.

How is that used? Is there library support yet?


Kamelito Loveless

Posts 259
21 Apr 2020 12:45


Even if Fblit replace some Blitter functions the programs still use own/disownblitter() so is these two functions been replaced by RTS?


Stefan "Bebbo" Franke

Posts 139
25 Apr 2020 20:38


Manfred Bergmann wrote:

Gunnar von Boehn wrote:

  HYPERTHREADING will NOT double your CPU power but its relative low cost in HW and will for certain applications will give a great boost.
 

 
  How is that used? Is there library support yet?

The "only change" to support HT is a modified exec scheduler, and a Forbid/Permit which stops/starts all cpus.

... and then you'll get hit by all the stuff which bypassed exec :-)



Kamelito Loveless

Posts 259
01 Aug 2020 14:57


A faster blitter would have solved the problem no?
Why did you choose to do not improve the blitter? It was the path that CBM wanted to take.
About AAA
« The Blitter had been improved using pixel addressing rather than the familiar masks. This may be meaningless but makes it easier to program. Several tweaks of the Blitter design also allowed it to move data around faster. The Dave Haynie archive indicates performance increases by 6x when scrolling a 640x200x2 screen »

Source EXTERNAL LINK 


posts 70page  1 2 3 4