Overview Features Instructions Performance Forum Downloads Products OrderV4 Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Performance and Benchmark Results!

Can Vampire Do 640x240x32bit ?page  1 2 3 4 

Vladimir Repcak

Posts 351
07 May 2020 11:49


So, can SAGA merge two bitmaps of 2 various bitdepths at run-time ?

Yes ?
No ?



Gunnar von Boehn
(Apollo Team Member)
Posts 4856
07 May 2020 13:23


Vladimir Repcak wrote:

So, can SAGA merge two bitmaps of 2 various bitdepths at run-time ?
 
Yes ?
No ?
 

 
Your question needs a much longer answer.

First SAGA  (SUPER-AGA) is the name used for both V4 and V2 chipset.
The V4 (SAGA) also known as "GOLD-3-Chipset" has some more features than the default V2 GOLD-2 Release
 
The complete "Super-AGA" does offer the features of AGA plus RTG plus special features.
 
These features include:
* 8 DMA based Audio channel each with Panning Support
* 16 Sprites DMA channels.
  Each channel can show multible 16 colors sprites.
* 8 Bitplanes
* 1 Multiformat Chunky Plane
* 1 Multiformat PIP Plane
1024 CLUT color register
* HW Collision Detection
* Copper
 
 
The Pip plane can be shown "above" the normal Chunky Plane.
So this is maybe similar to what you want.
The PIP Plane can be set "opaque" over the normal plane
or can be color key-ed (only appear over a special color)
I assume for your situation also a "transparent" mode  in PIP is what you need. We not have this today, but this could be added.
 


Vladimir Repcak

Posts 351
07 May 2020 14:30


If it isn't supported today, then it's off the table.

How about 640x360 ? Is it directly supported without any configuration ?

Will it start up without any user intervention, or would this mode have to be manually configured somehow in the system ?


Gunnar von Boehn
(Apollo Team Member)
Posts 4856
07 May 2020 14:39


Vladimir Repcak wrote:

If it isn't supported today, then it's off the table.

When do you plan to release the finished game?
Will it be before the next core release?
 
 
Vladimir Repcak wrote:

How about 640x360 ? Is it directly supported without any configuration ?

This mode is shipped IMHO with Coffin.
For it applies the same rule as for 640x480

It is included of the default modes, but a user can "Delete" it.


Gunnar von Boehn
(Apollo Team Member)
Posts 4856
07 May 2020 15:48


You could on one layer put an animation like this...
And animate it simply by changing PTR each frame


Vladimir Repcak

Posts 351
08 May 2020 01:06


Gunnar von Boehn wrote:

Vladimir Repcak wrote:

  If it isn't supported today, then it's off the table.
 

  When do you plan to release the finished game?
  Will it be before the next core release?
 
 
 
Vladimir Repcak wrote:

  How about 640x360 ? Is it directly supported without any configuration ?
 

  This mode is shipped IMHO with Coffin.
  For it applies the same rule as for 640x480
 
  It is included of the default modes, but a user can "Delete" it.

Well, when is the next core release?

I already burned a week on 640*480.

And am planning to burn another on optimizations that I started already. Could be more, hard to say...


Vladimir Repcak

Posts 351
08 May 2020 01:11


Regarding the 24-bit.

I just checked cybergraphics docs and they indeed directly support 24-bit.

There's a performance disadvantage though. Instead of writing a pixel via single move.l, now it will be three move.b

Copying should still work via 64-bit fuse.

Theoretically, the gain from 25% less data could be erased by 3*more ops for pixel fill.
Very curious about the benchmark.


Gunnar von Boehn
(Apollo Team Member)
Posts 4856
08 May 2020 10:10


Vladimir Repcak wrote:

There's a performance disadvantage though. Instead of writing a pixel via single move.l, now it will be three move.b

 
I could envision several options:
 
1) MOVE.W + MOVE.B
 
2) Use AMMX which has multibyte Store (including 3 Byte moves)
 
3) Using 2 layer, one background layer with 256 CLUT or with YUV
and 256 CLUT Game layer.
 
 


Gunnar von Boehn
(Apollo Team Member)
Posts 4856
08 May 2020 10:15


Vladimir Repcak wrote:

I already burned a week on 640*480.

I see. I feel sorry for you.

At the same time , I have the feeling this form of communication is very inefficient. Here in the forum you post sometimes something, and you might get some feedback sometimes..
This is slow and not the most efficient or effective.

I could imagine that a much closer contact, could give more ideas and faster implementation.
 


Vladimir Repcak

Posts 351
08 May 2020 11:39


Gunnar von Boehn wrote:

  2) Use AMMX which has multibyte Store (including 2 Byte moves)

AMMX sounds intriguing, but without direct deployment from PC I can't experiment with it now.

Gunnar von Boehn wrote:
 
  1) MOVE.W + MOVE.B

It's a bit more complex than that. There's many scenarios depending on whether it's aligned on Hi or Lo byte :

1-pixel scanline:
(HLH)
(LHL)

2-pixel scanline:
(HLH)(LHL)
(LHL)(HLH)

3/5/7/...-pixel scanline: (odd pixel count between edges)
(HLH)(LHL)(HLH)
(LHL)(HLH)(LHL)

4/6/8/...-pixel scanline: (even pixel count between edges)
(HLH)(LHL)(HLH)(LHL)
(LHL)(HLH)(LHL)(HLH)

That is a lot of conditions to handle.

Now multiply all those conditions by ~15,000 (e.g. per scanline).

Poof, there goes the 85 MHz :)

Extremely small amount of scenarios gets away with a per-scanline condition - meaning you burn through the cycles even for a negative scenario, yet the grand total cycle count is still lower.

There's a reason why I have 14 different codepaths in my Quad rasterizer. I do not allow a single condition per scanline - all conditions happen only per polygon, never per scanline.

The function is barely 50 pages. Very maintainable :)



Gunnar von Boehn
(Apollo Team Member)
Posts 4856
08 May 2020 12:11


Vladimir Repcak wrote:

  1-pixel scanline:
  (HLH)
  (LHL)
 
  2-pixel scanline:
  (HLH)(LHL)
  (LHL)(HLH)
 
  3/5/7/...-pixel scanline: (odd pixel count between edges)
  (HLH)(LHL)(HLH)
  (LHL)(HLH)(LHL)
 
  4/6/8/...-pixel scanline: (even pixel count between edges)
  (HLH)(LHL)(HLH)(LHL)
  (LHL)(HLH)(LHL)(HLH)

I'm not sure if you need to make your code such complicated.
I can offer you again to look at your code if you want.



Markus B

Posts 196
08 May 2020 15:30


Very interesting to follow your discussion here.

Some remarks:
- Amiga people are very used to handle RTG settings. If you really need to stick to some idiot-proof approach, you would need to go with AGA, which is too limited I guess.
- You can check upfront if the needed screenmode is configured. Not sure about the options (pure speculation), but it may be possible to create the needed screenmode on the fly. Don't know if P96 could be scripted that way.
- If Vampire systems could boot from a CD/ISO, you'd have control over the OS during runtime. Then the needed screenmode can be made available for sure.


Niclas A
(Apollo Team Member)
Posts 216
08 May 2020 16:16


What would normally happen is you allow res equal or bigger and get large black areas if bigger. Remember creating lots of special resolutions back in the day when I got my voodoo 3 for my A1200. This to get full-screen in emulators. Otherwise you would get a stamp size picture in top left corner and lots of black 🙂


Vladimir Repcak

Posts 351
09 May 2020 03:59


Gunnar von Boehn wrote:

Vladimir Repcak wrote:

  1-pixel scanline:
  (HLH)
  (LHL)
 
  2-pixel scanline:
  (HLH)(LHL)
  (LHL)(HLH)
 
  3/5/7/...-pixel scanline: (odd pixel count between edges)
  (HLH)(LHL)(HLH)
  (LHL)(HLH)(LHL)
 
  4/6/8/...-pixel scanline: (even pixel count between edges)
  (HLH)(LHL)(HLH)(LHL)
  (LHL)(HLH)(LHL)(HLH)
 

 
  I'm not sure if you need to make your code such complicated.
  I can offer you again to look at your code if you want.
 

Thanks, I ran the numbers in excel in more detail and realized the following:

59.59% of FrameTime : 32-bit CopyBackground
32.55% of FrameTime : 32-bit PixelFill
-------------------
92.14% of FrameTime

Interpolated Estimates for 24-bit:

44.69% of FrameTime : 24-bit CopyBackground
65.10% of FrameTime : 24-bit PixelFill
-------------------
109.79% of FrameTime

So, while we can copy background bitmap using fused 64-bit write, the PixelFill will execute 2x amount of ops (3x move.b + dbra) instead of (move.l + dbra)

So, I believe it's reasonable to assume that the execution time will double from 32.55% to 65.10%

Even if pipelining would be ideal and there wouldn't be too many bubbles, and 2x more ops took just 50% more time (very doubtful), the total would still be 93.15 (44.69 + 1.5x32.55) - which is still more than leaving it as it is at 32-bit at 92.14.

So, no performance point in doing 24-bit, really. It's just more code for no framerate benefit.




Vladimir Repcak

Posts 351
09 May 2020 04:06


Niclas A wrote:

What would normally happen is you allow res equal or bigger and get large black areas if bigger. Remember creating lots of special resolutions back in the day when I got my voodoo 3 for my A1200. This to get full-screen in emulators. Otherwise you would get a stamp size picture in top left corner and lots of black 🙂

Is that how it was ? That sounds pretty weird.

So, in our case of 640x360, the resolution wouldn't fill whole screen ? Kinda like some modern LCDs do (they center the resolution and put black bars around) ?

On an actual CRT monitor back in the day? Wow.


Vladimir Repcak

Posts 351
09 May 2020 04:12


Markus B wrote:

Very interesting to follow your discussion here.
 
  Some remarks:
  - Amiga people are very used to handle RTG settings. If you really need to stick to some idiot-proof approach, you would need to go with AGA, which is too limited I guess.
  - You can check upfront if the needed screenmode is configured. Not sure about the options (pure speculation), but it may be possible to create the needed screenmode on the fly. Don't know if P96 could be scripted that way.
  - If Vampire systems could boot from a CD/ISO, you'd have control over the OS during runtime. Then the needed screenmode can be made available for sure.

Well, I spent last week thinking about all scenarios and I don't want to loose the sharpness of 640x480.
But, I don't want to loose fluidity of 320x240 either.

Plugging all scene complexity numbers from my benchmarks into excel it's obvious that it would run only around 30 fps @640x480 anyway, with frequent drops to 20.

Might as well straight lock it to 20.

I just implemented a variable FPS lock - I can choose 60,30 or 20 fps.

So, for 320x240 I will choose 30 and for 640x480 I will choose 20.

There are still some changes to be done to the physics so they are fully framerate independent, but those should take only couple days...

And it will be up to the user to choose. It's the best solution anyway, even if it's shitload of additional work...


Gunnar von Boehn
(Apollo Team Member)
Posts 4856
09 May 2020 06:29


Vladimir Repcak wrote:

So, I believe it's reasonable to assume that the execution time will double from 32.55% to 65.10%

What code is in the fill loop?
Is it only the MOVE.L ?

If you post the inner, outer work loop, then we can discuss this a lot better.


Gerardo González-Trejo

Posts 41
09 May 2020 07:47


I remember years ago I had a collegue who was always searching for info and support but he never was sharing the piece of code that would had allowed us to give him such good answer he ever wanted. He is out of the business actually, more than ten years after, but maybe he tought he had a treassure then... Who knows! ;)


Niclas A
(Apollo Team Member)
Posts 216
09 May 2020 10:38


Vladimir Repcak wrote:

Niclas A wrote:

  What would normally happen is you allow res equal or bigger and get large black areas if bigger. Remember creating lots of special resolutions back in the day when I got my voodoo 3 for my A1200. This to get full-screen in emulators. Otherwise you would get a stamp size picture in top left corner and lots of black 🙂
 

  Is that how it was ? That sounds pretty weird.
 
  So, in our case of 640x360, the resolution wouldn't fill whole screen ? Kinda like some modern LCDs do (they center the resolution and put black bars around) ?
 
  On an actual CRT monitor back in the day? Wow.

If you did not have 640x360 when the screen requester showed you would maybe select 640x480 and get 120 lines of black at the bottom. If you choose 800x600 it would have 160 columns of black on the right and 240 lines of black at the bottom.

Offcource if you choose a 640x360 res (that you created) it would be full screen and look great with no blacks on the side.

Thats atleast how a lot of emus back in the day handled it.



Vladimir Repcak

Posts 351
09 May 2020 11:53


Gunnar von Boehn wrote:

Vladimir Repcak wrote:

  So, I believe it's reasonable to assume that the execution time will double from 32.55% to 65.10%
 

 
  What code is in the fill loop?
  Is it only the MOVE.L ?
 
  If you post the inner, outer work loop, then we can discuss this a lot better.

Sure.

So, this is 32-bit pixel fill loop (with 2 ops per pixel):


d0: Length of scanline - 1  (dbra count)
d7: RGBA Color of scanline

  .dsLoop:
    move.l d7, (a0)+
  dbra d0,.dsLoop

And, this is 24-bit pixel fill loop (with 4 ops per pixel):


d0: Length of scanline - 1  (dbra count)
d7: R of scanline color
d6: G of scanline color
d5: B of scanline color

  .dsLoop:
    move.b d7, (a0)+
    move.b d6, (a0)+
    move.b d5, (a0)+
  dbra d0,.dsLoop

My test scene (10 chunks - each 100 triangles - into distance) draws 207,522 pixels in 640x480. Pixel Fill stage takes 32.55% of Frame Time.

The code above is inlined 14x across whole method (there's no jsr/rts into this code, it's straight in the middle of scanline processing loop)

What do you think ? That loop executes 207,522 times.


posts 80page  1 2 3 4