|Polygon Pushing Performance of the 080||page 1 2 3 4 5 6 7 8 9 |
21 Sep 2018 10:09
Andy Hearn wrote:
I have a CV64/3D in my A3k if anyone wants some bench
Thellier Alain wrote:
Yes Thanks : you can try aminet/cow3D
ok I'll try to piece that together over the next few days:-
Virge CV64/3D powered by 060
Permedia2 BVPPC powered by 040 and PPC
Gunnar von Boehn
(Apollo Team Member)
25 Sep 2018 11:33
the topic of this thread is very good.
I removed some off topic ideas which are impossible todo on our FPGAs,
to allow us to focus better on the topic.
Thanks for your understanding
02 Oct 2018 21:26
|ok, first round testing done. initially I was sure something was wrong with my setup, but as follows.|
P96 latest, W3D4.2a, latest coffinOS
A3k 060, 128+16+256meg ram, DCE CV64/3D
W3D Engine demo @320x240 15bit
(looking straight ahead without moving)
27fps without bilinear filtering
25fps with bilinear filtering
W3D Engine Demo @640x480 15bit
Same scene as before
12fps without filtering
9-10fps with filtering
that made me happy that everything was installed correctly and working after this initial scare that i'd messed something up as stupidly, I tried to run before walking and went for Cow3D first with these results:-
Cow3D - 0FPS. pressing "b" made no discernible difference.
it was rendering just less than 1fps.
Workbench @800x600 15bit. Cow3D fails to open
Workbench @640x480 15bit. Cow3D opens, and "runs"
workbench at 16bit, colours are all over the place, my first mistake, but it ran - again at "0"fps.
Workbench at 8bit. fails - not unexpected.
EXTERNAL LINK EXTERNAL LINK
so, not the highest bar in the world to jump over. now on to the permedia2, and the woes/wonders of cybergraphx.
in the mean time. i'm gonna try me some GLquake :D
02 Oct 2018 22:09
|GLquake update. downloaded GLquake_blitz ver1.50 from aminet|
dropped workbench to 320x240@8bit to minimise gfx ram use.
config'd the glquake load script to setup a 320x240@15bit screen instead of 640x480@16bit.
ran. all looking good. text rendering is basic, transparency effects are just solid objects. some texture smearing on some objects, and occasional triangle setup hiccups. but it runs.
timedemo demo2 = 6fps.
03 Oct 2018 12:25
|Arti published this video of Quake2 running on a V4:|
Still unoptimized, but looks almost playable
03 Oct 2018 15:19
Thanks for the FPS feedback: it will give a base to what performances will the vampire need to reach
The cow need to turn some times before computing an fps value: dont remenber if it need 10 or 20 "turns"
There is also aminet/starshipw3d that is almost the same prog but got a simpler 3d object so go faster
03 Oct 2018 21:50
|ok. workbench set to 640x480@15bit. (800x600 and the Z-buffer setup fails.)|
so. cow3D has been running for about an hour now. pressing "B" makes no change that I can see.
I calc about 87.5 frames per *MINUTE* - about one rotation. and indeed, after about 20 minutes, cow3D gave me "1fps". I suspect the large amount of memory you're talking about is framebuffer ram? the CV64/3D only has 4meg. so we're back down to the Z3 to fastram interface for any buffered geometry calls that don't fit?
I can (visually guess that) at least double the fps with the "c" option to remove texture calls and use colour only?
Starship runs at 24fps
for giggles I dropped workbench to 320x240, but both cow and starship still open their windows the same size. starship got 31fps, cow3d … I guess a bit more but not enough to make an appreciable difference.
03 Oct 2018 23:19
|preliminary permedia2 testing|
A1200,BlizzPPC040@25,PPC@200, matched pair of 64meg sims, BVisionPPC
Base OS3.1 install, base CGX4 install off CD, CGX4r1 install, warp3D4.2a install.
workbench at 640x480@15bit
starship, 30fps, but it looks faster than that. I guess it may be vertical refresh locked?
cow3D, 2.66fps with my BPM counter timing method. (I didn't have long to play, more testing tomorrow)
320x240@15bit 59fps - filtering doesn't make a difference to fps
640x480@15bit 30fps - again, no filtering difference to fps.
i'll do some GLquake stuff tomorrow.
04 Oct 2018 20:02
|Gunnar von Boehn wrote:|
If one aims for "realistic" goals like something between PS1-PS2 games then an interesting discussion topic could be
what the most important key features for HW accelration pf the 3Dcore will be.
What do you think?
I like Dreamcast very much, the graphics power has been ahead of its time, simply arcade feeling. The PowerVR chip removed unvisible polygons before rendering to save fillrate. Does SAGA aleady use this?
Quelle Wikipedia :
PowerVR benutzt eine eigene Technik, das sog. tile-based deferred rendering, also eine auf Kacheln basierende verzögerte Bildberechnung. Das deferred rendering hat den gleichen Effekt wie das Hidden Surface Removal (HSR) moderner immediate renderer wie der Nvidia-Geforce-Serie oder AMDs Radeon-Serie, nämlich dass Polygone, die von anderen überdeckt und deswegen nicht sichtbar sind, vor dem Rendern verworfen werden, so dass Bandbreite und Füllrate gespart werden. Um diesen Effekt zu optimieren, teilt PowerVR eine 3D-Szene in mehrere Tiles (Kacheln) auf und in jedem Tile wird dieser HSR-ähnliche Vorgang durchgeführt.
Diese Technik ermöglicht es, Grafikchips der Konkurrenz mit eigentlich höherer Rechenleistung zu überflügeln.
04 Oct 2018 21:22
all transparency effects and texture warping all fine, text all fine, no geometry glitches.
text rendring problems, texturing problems, light effects rendered as solid objects, transparency effects flat shaded, occasional vertex setup problems.
Posts 55/ 1
05 Oct 2018 14:51
I already clearly stated why 23Mhz cpu equipped consoles of the 90's could do 100k-200k polygons/second and have real Quake ports running at almost 30fps without the compromises listed above...
My posts were deleted...as I'm sure this one will be...
05 Oct 2018 23:34
|Hi Louis -|
If you follow the video you linked to you will hear they rewrote the engine that renders the game. The engine was custom-written for the machine and its custom hardware. That was the only way to get semi-decent performance (often down to 10fps). Other tricks involved adding extra walls in the levels to reduce the rendering load.
I suggest you are comparing apples and oranges.
To my understanding this thread (and related threads) are about how much the existing core and AMMX instructions can accomplish. This will help in various ways with future plans... helping to identify the areas most suited for attention/optimization.
We are all on a journey... let's get there together without harsh words.
Posts 55/ 1
08 Oct 2018 23:30
|Most such games suffer frame drops..."down to" 10 fps from a 2 23mhz cpus at 320x200 thanks do a dedicated 3D chip is still much better than the video above. So like I said before, AMMX needs to be added to an existing or new "custom chip" to offload 3D from the cpu.|
that's not to say the cpu can't assist with it, just with game logic and input response, the cpu tends to be plenty busy already...
..and it's not like I'm saying "remove AMMX from the cpu" - I'm saying add it to a custom chip as well.
I thought that 3D demo from a few months ago written for the 060 that used integer math to odd frames and fpu math to do the even ones was extremely clever... Imagine if you could do that with a few cores in a custom chip and the cpu...
09 Oct 2018 16:22
|I agree with Louis here. Although I'm not sure how you imagine a solution with a custom chip.|
But in general I like the idea to implement multiple AMMX units which can be used in parallel. Depends pretty much on the available space in the FPGA.
It would avoid the need to think about SMP within AmigaOS, but make those AMMX units available as some sort of co-processors.
Maybe Gunnar can answer how much space a AMMX unit would need and if they could be operated in parallel for video stuff.
Posts 55/ 1
09 Oct 2018 17:47
|Personally, I think the main issue is having 2 memory controllers... One for "fast ram" to keep the cpu fed, and then the traditional chip/gpu ram for the custom chips so that those co-processors can process lists of vertices.|
By the way, the blitter, copper, gary, etc are 'custom chips' to me. ;)
09 Oct 2018 19:54
|Louis Dias wrote:|
By the way, the blitter, copper, gary, etc are 'custom chips' to me. ;)
You're wrong, gary is the pet from spongebob.
But I think there has to be cleared something up first that is not very clear to me.
For the 'akikio' mentioned earlier in other topics it was explained that the performance will be less to shit if it will be outsourced due to communication latency. Is there a 'golden rule' / calculation / recipe in which you can know when outsourcing will be beneficial? Or does every new idea require an entire overhaul of the design to be fully efficient?
Maybe someone knows an interesting article/book so I can do my homework next time I ask something that maybe obvious? :)
ANyhow, keep up the good work Apollo team and all programmers!!
09 Oct 2018 20:53
|Deleted by myself: was a bit off topic|
Posts 55/ 1
09 Oct 2018 21:25
I mentioned AKIKO earlier since that's where the last "Commodore" Amiga put augmented video processing functions (aka C2P). Agnus/Alice is the memory controller and splits access to chip memory between the cpu and the chipset essentially only giving either half the bandwidth.
FastRam goes directly to the cpu (only) and at full bus speed and can be accessed even when the chipset is accessing chip ram
So having said that - that's why I questioned the unified memory format...and if a custom chip could execute 3D instructions in parallel, the cpu could be running other code from FastRAM while a 'custom chip' was transforming a stream of polygons in ChipRAM.
I'm not sure what's happening inside a Vampire as far as fast vs chip ram so I'm asking questions and making suggestions...
As for performance of the AKIKO in the CD32, it helped the EC020 with no fastram to C2P. Add fastram and the advantage is not so much. However, one that could process 1024 bytes instead of 4 bytes at a time would be much more efficient. (as in 256 times more efficient)... Such enchancements will not fit on the V2...
09 Oct 2018 21:36
Please stop confusing the AKIKO chip with anything 3D in nature. It was simply a glue logic chip for the CD32 and could also do chunky to planar bitmap conversions. It had nothing to do with 3D geometry or polygons. As Gunnar has stated on more than one occasion, the current Apollo Core can already perform these functions much more efficiently than an AKIKO chip and it would be redundant to add an FPGA AKIKO to the existing Apollo core.
And why advocate the design and inclusion of a virtual and dedicated 3D GPU co-processor into the core? The glue logic alone to connect this virtual GPU to the rest of the system would make it impractical and much slower that just adding the same 3D functionality to the existing core.
09 Oct 2018 22:16
|I think he is referring to the Akiko as an example how parallel processing could speed things up. |