Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Performance and Benchmark Results!

Can Vampire Do 640x240x32bit ?page  1 2 3 4 

Vladimir Repcak

Posts 359
30 Apr 2020 23:17


I finally got detailed profiling/benchmarks from my 3D engine from V2. Test rendered 1,000,000 polygons so the data was nicely averaged over long period of time (and are thus pretty stable).

I broke down the benchmarks into 4 stages:
1. 3D Transform
2. Polygon Set-Up (Sorting points, computing edges)
3. Scanline Traversal (break down each polygon into horizontal scanlines)
4. Pixel Fill (of each scanline)

One thing became really clear : The Pixel Fill cost is negligible. Doubling it won't result in any framedrops.

On Atari Jaguar, 640x240 was my favorite resolution for the same reason. It may not seem so on paper, but in motion it's significantly less pixelated than 320x240.

So, can CyberGraphX on Vampire directly make 640x240 resolution ?

In case you're wondering, 640x480 would double the third stage (scanline traversal) and quadruple the 4th stage, so for my first game I won't entertain 640x480, as I would like to keep it fast-paced and that wouldn't be possible in 640x480 given Vampire's performance and my desired 3D scene complexity.



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 May 2020 02:36


Vladimir Repcak wrote:

    I broke down the benchmarks into 4 stages:
    1. 3D Transform
    2. Polygon Set-Up (Sorting points, computing edges)
    3. Scanline Traversal (break down each polygon into horizontal scanlines)
    4. Pixel Fill (of each scanline)

   
Would you mind to share your results for theses parts?
BTW How do you do the screen update, and the swap of the frame?
   
   
Vladimir Repcak wrote:

  So, can CyberGraphX on Vampire directly make 640x240 resolution ?

This depends on your definition of the word.
Technically yes, as 640x240 is just DoubleY-mode of the Standard resolution 640x480.
But please mind that on Amiga RTG/prefs system allows every user can define a new resolution and also allows to delete an existing one.
This is true for AMIGA OS RTG, but not true for AROS.
AROS has some fixed modes, that the user can not delete.
 
640x240 is of course not a 19:6 resolution.
Do you plan for a game in 4:3? With black bars on Left / Right for 16:9 users?


Vladimir Repcak

Posts 359
01 May 2020 08:36


I can update my original 3D engine thread with some benchmark stats.
Right now, since I'm still primarily working against emulator, I do the screen update the old way, but will soon just flip the SAGA pointer, so that I only have to clear the framebuffer, not copy it (like now).

I was hoping the resolution would be native, as messing with the prefs won't guarantee that the resolution will work for everyone.

So, 320x240 it is, then.

The beauty of flatshading is that it doesn't care for 16:9 or 4:3, it still looks the same.

I doubt there are many working 4:3 CRT monitors out there. The last few will die soon anyway.

  I think I will default to 16:9, it's 2020 after all...


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 May 2020 09:54


Vladimir Repcak wrote:

  I was hoping the resolution would be native, as messing with the prefs won't guarantee that the resolution will work for everyone.
 
  So, 320x240 it is, then.
 

Please mind that with 320x240 you have the same situation.
Every user can delete any resolution.
This is what I tried to explain you.

 
Vladimir Repcak wrote:

  I think I will default to 16:9, it's 2020 after all...

16:9 makes some sense.
But 320x240 and 640x240 are not real 16:9 modes, are they?



Vladimir Repcak

Posts 359
01 May 2020 10:21


Well, if somebody deletes a standard resolution, then that's not my problem.

Stretching 320*240 by LCD into 16:9 is not a problem because it's flatshaded, so there's no distortion.

But, some LCD displays might just center the image and then it would be distorted.

I'm going to think about it some more...


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 May 2020 14:14


Vladimir Repcak wrote:

  Stretching 320*240

I wonder if you not can do lot more...
Stunt car racer was 320x240 and had 300 times slower CPU :)
I think your good game should be able to do more.


Stefano Briccolani

Posts 586
01 May 2020 16:03


I'm sure 680x480 can be handled by your engine at a very fast framerate.
I will even suggest you to consider rethinking your game as a stun runner/f-zero clone (as your initial plan) rather than a shooter. A racer with fast-paced smooth 3d environments should be a good showcase for your engine.



Mark Watson

Posts 5
01 May 2020 16:24


For my own stuff I was thinking a fixed width is possible, say 640, with a fallback memcpy like you are doing now if a user deletes all 640xX modes, and top and bottom boarders can be added by just changing the frame pointer passed to the draw code.


Olaf Schoenweiss

Posts 690
01 May 2020 16:49


sorry for being OT

could you answer there please?

CLICK HERE


Vladimir Repcak

Posts 359
01 May 2020 19:54


Gunnar von Boehn wrote:

Vladimir Repcak wrote:

  Stretching 320*240
 

 
  I wonder if you not can do lot more...
  Stunt car racer was 320x240 and had 300 times slower CPU :)
  I think your good game should be able to do more.

That 300x ratio goes real fast:
FrameRate:
3x : 6 VBL / Frame (10 fps) vs 2 VBL / Frame (30 fps)
Now, we're down to a difference of 100x

Scene Complexity (Polygons):
200x : 5 quads / frame vs 1,000 quads / frame

Oops, I'm already 2x more efficient than Stunt Car Racer :)

And we didn't even get to 4-bit rendering vs 32-bit rendering.

Which, in itself, is pushing 8x more data around.

Not to mention, than when 90% of the time [in StuntCar], you're rendering just a road (those 5 quads), you don't even have to do any looping, just handle the polygons straight.

With 1,000 quads (2,000 triangles), you gotta loop everything everywhere...



Vladimir Repcak

Posts 359
01 May 2020 20:00


Gunnar von Boehn wrote:

  I wonder if you not can do lot more...

According to my benchmarks, if I was willing to go down to the framerate of StuntCarRacer (10 fps), I could render a scene of :

11,000 triangles

So, yeah. I can certainly do a lot more.

But, I certainly don't want to push the visual boundaries at the cost of a brutally unplayable framerate.

But, yes - StuntCarRacer with 11,000 triangles per frame could look really, really nice :)



Vladimir Repcak

Posts 359
01 May 2020 20:06


Stefano Briccolani wrote:

I'm sure 680x480 can be handled by your engine at a very fast framerate.

We will run the benchmark for 640x480 this weekend.

Copying the background bitmap at 640x480x32bit will eat a lot of bandwidth, and it's going to take 4x longer than at 320x240.

Plus, we will double the scanline traversal stage and quadruple the Pixel Fill stage.

Not a lot of performance buffer at 2 frames / scene, as I don't want any frame drops, so I must be able to finish under 1.5 frames so that any CPU spikes are still within the 0.5 frame boundary (thus always staying at 30 fps).


Gunnar von Boehn
(Apollo Team Member)
Posts 6207
01 May 2020 20:56


Vladimir Repcak wrote:

Copying the background bitmap at 640x480x32bit will eat a lot of bandwidth, and it's going to take 4x longer than at 320x240.

Can you share some recent pictures / screenshots?
I'm curious about how this 32bit background looks



Vladimir Repcak

Posts 359
05 May 2020 01:37


I need to finish the planet renderer first, then I will have some new screenshots, but right now I am working on a different section.

Sharing new screenshots will happen once I close the preorders, which should hopefully happen in few weeks.

Doing it now wouldn't be fair to people who already participated with the information that was available at the time.



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
05 May 2020 13:37


Vladimir Repcak wrote:

I need to finish the planet renderer first,

Its all good.
We only wanted to help with ideas if we could help you make ti render faster.


Vladimir Repcak

Posts 359
06 May 2020 09:39


So, I have it finally running stable at 640x480 (@32bit).

It does, indeed, look significantly sharper. My previous low-poly scenes couldn't possibly show it that well (I mean - last time I ran at 640x480, I had that simple base segment with 10 triangles), but when you got ~1,000 triangles in the scene, going all the way to the distance, it's quite another thing...

Few options:
1. Reduce the framerate lock from 30 fps to 20 fps
2. Reduce the scene complexity to retain stable 30 fps lock
3. Keep 30 fps lock, but accept frequent framedrops to 20 fps

Next benchmark on the target HW will give me the data.

If I had to hazard a guess, I'm leaning towards option 2.


Vladimir Repcak

Posts 359
06 May 2020 09:45


Gunnar von Boehn wrote:

  Its all good.
  We only wanted to help with ideas if we could help you make ti render faster.

OK, performance question :)

- I am clearing the framebuffer by copying the background bitmap.
- I have unrolled full scanline (320x) worth of move.l (a0)+,(a1)+
- This means that at 320x240, there's only 240x a loop overhead op (dbra)

How will the code above fill both CPU pipes with regards to pipeline stalls ?

Wouldn't it be more effective to have 2 sets of pointers (say, one scanline apart) and just keep copying two rows at the same time ?
Something like:
move.l (a0)+,(a1)+    ; Row X
move.l (a2)+,(a3)+    ; Row X+1 (e.g. a2 is 320px further than a0)

The above would be unrolled 320x to minimize loop overhead.

Especially at 640x480, that's 4x more instructions than at 320x240.



Gunnar von Boehn
(Apollo Team Member)
Posts 6207
06 May 2020 09:48


Vladimir Repcak wrote:

Few options:
  1. Reduce the framerate lock from 30 fps to 20 fps
  2. Reduce the scene complexity to retain stable 30 fps lock
  3. Keep 30 fps lock, but accept frequent framedrops to 20 fps

You speak of "framerate" lock.
How did you implement such "lock"?




Gunnar von Boehn
(Apollo Team Member)
Posts 6207
06 May 2020 09:59


Vladimir Repcak wrote:

How will the code above fill both CPU pipes with regards to pipeline stalls ?

DBRA can be executed in 2nd Pipe (for free)

Vladimir Repcak wrote:

  Wouldn't it be more effective to have 2 sets of pointers (say, one scanline apart) and just keep copying two rows at the same time ?
  Something like:
  move.l (a0)+,(a1)+    ; Row X
  move.l (a2)+,(a3)+    ; Row X+1 (e.g. a2 is 320px further than a0)

No this is not good.
This code would create Read from 2 different memory regions and write to 2 different regions and the write would be 32bit.

The memory stream prefetching will work better if you continuously read from 1 memory region. And also if you write continuously to one region you can create 64bit write which will be faster

Such Loop such give you good performance.


LOOP
  move.l (A0)+,(A1)+
  move.l (A0)+,(A1)+
  move.l (A0)+,(A1)+
  move.l (A0)+,(A1)+
dbra D0,LOOP

Just make sure you "screen buffer" is 64bit aligned.




Vladimir Repcak

Posts 359
06 May 2020 10:00


By waiting till vblank, after scene has finished rendering.

Each frame of the scene has different amount of instructions executed, so there's some difference between how long each frame takes to execute.

Let's say I target 1.49 frames for average and allow 50% of frame time for a buffer (exact number still remains to be benchmarked). Then even in worst case I still only take 1.99 frame.

And I can still keep stable framerate.  Which is paramount for a great gameplay experience.

Occasional dropping from 60 to 30 fps in high-speed environment is quite horrendous (from direct experience). It's much better to keep 30 fps lock, but without framedrops.

posts 80page  1 2 3 4