Polygon Pushing Performance of the 080 | page 1 2 3 4 5 6 7 8 9
|
---|
|
---|
| | Andy Hearn
Posts 374 21 Sep 2018 10:09
| Andy Hearn wrote: I have a CV64/3D in my A3k if anyone wants some bench Thellier Alain wrote: Yes Thanks : you can try aminet/cow3D
|
ok I'll try to piece that together over the next few days:- Virge CV64/3D powered by 060 Permedia2 BVPPC powered by 040 and PPC
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 25 Sep 2018 11:33
| Dear users, the topic of this thread is very good. I removed some off topic ideas which are impossible todo on our FPGAs, to allow us to focus better on the topic. Thanks for your understanding
| |
| | Andy Hearn
Posts 374 02 Oct 2018 21:26
| ok, first round testing done. initially I was sure something was wrong with my setup, but as follows. P96 latest, W3D4.2a, latest coffinOS A3k 060, 128+16+256meg ram, DCE CV64/3D W3D Engine demo @320x240 15bit (looking straight ahead without moving) 27fps without bilinear filtering 25fps with bilinear filtering W3D Engine Demo @640x480 15bit Same scene as before 12fps without filtering 9-10fps with filtering that made me happy that everything was installed correctly and working after this initial scare that i'd messed something up as stupidly, I tried to run before walking and went for Cow3D first with these results:- Cow3D - 0FPS. pressing "b" made no discernible difference. it was rendering just less than 1fps. Workbench @800x600 15bit. Cow3D fails to open Workbench @640x480 15bit. Cow3D opens, and "runs" workbench at 16bit, colours are all over the place, my first mistake, but it ran - again at "0"fps. Workbench at 8bit. fails - not unexpected.EXTERNAL LINK EXTERNAL LINK so, not the highest bar in the world to jump over. now on to the permedia2, and the woes/wonders of cybergraphx. in the mean time. i'm gonna try me some GLquake :D
| |
| | Andy Hearn
Posts 374 02 Oct 2018 22:09
| GLquake update. downloaded GLquake_blitz ver1.50 from aminet dropped workbench to 320x240@8bit to minimise gfx ram use.config'd the glquake load script to setup a 320x240@15bit screen instead of 640x480@16bit. ran. all looking good. text rendering is basic, transparency effects are just solid objects. some texture smearing on some objects, and occasional triangle setup hiccups. but it runs. timedemo demo2 = 6fps.
| |
| | Saladriel Amrael
Posts 166 03 Oct 2018 12:25
| Arti published this video of Quake2 running on a V4: EXTERNAL LINK Still unoptimized, but looks almost playable
| |
| | Thellier Alain
Posts 143 03 Oct 2018 15:19
| @Handy Hearn Thanks for the FPS feedback: it will give a base to what performances will the vampire need to reachThe cow need to turn some times before computing an fps value: dont remenber if it need 10 or 20 "turns" There is also aminet/starshipw3d that is almost the same prog but got a simpler 3d object so go faster
| |
| | Andy Hearn
Posts 374 03 Oct 2018 21:50
| ok. workbench set to 640x480@15bit. (800x600 and the Z-buffer setup fails.) so. cow3D has been running for about an hour now. pressing "B" makes no change that I can see. I calc about 87.5 frames per *MINUTE* - about one rotation. and indeed, after about 20 minutes, cow3D gave me "1fps". I suspect the large amount of memory you're talking about is framebuffer ram? the CV64/3D only has 4meg. so we're back down to the Z3 to fastram interface for any buffered geometry calls that don't fit? I can (visually guess that) at least double the fps with the "c" option to remove texture calls and use colour only?Starship runs at 24fps for giggles I dropped workbench to 320x240, but both cow and starship still open their windows the same size. starship got 31fps, cow3d … I guess a bit more but not enough to make an appreciable difference.
| |
| | Andy Hearn
Posts 374 03 Oct 2018 23:19
| preliminary permedia2 testing A1200,BlizzPPC040@25,PPC@200, matched pair of 64meg sims, BVisionPPC Base OS3.1 install, base CGX4 install off CD, CGX4r1 install, warp3D4.2a install. workbench at 640x480@15bit starship, 30fps, but it looks faster than that. I guess it may be vertical refresh locked? cow3D, 2.66fps with my BPM counter timing method. (I didn't have long to play, more testing tomorrow) W3D EngineDemo 320x240@15bit 59fps - filtering doesn't make a difference to fps 640x480@15bit 30fps - again, no filtering difference to fps. i'll do some GLquake stuff tomorrow.
| |
| | Johannes Schäfer
Posts 47 04 Oct 2018 20:02
| Gunnar von Boehn wrote:
| If one aims for "realistic" goals like something between PS1-PS2 games then an interesting discussion topic could be what the most important key features for HW accelration pf the 3Dcore will be. What do you think?
|
I like Dreamcast very much, the graphics power has been ahead of its time, simply arcade feeling. The PowerVR chip removed unvisible polygons before rendering to save fillrate. Does SAGA aleady use this? Quelle Wikipedia : PowerVR benutzt eine eigene Technik, das sog. tile-based deferred rendering, also eine auf Kacheln basierende verzögerte Bildberechnung. Das deferred rendering hat den gleichen Effekt wie das Hidden Surface Removal (HSR) moderner immediate renderer wie der Nvidia-Geforce-Serie oder AMDs Radeon-Serie, nämlich dass Polygone, die von anderen überdeckt und deswegen nicht sichtbar sind, vor dem Rendern verworfen werden, so dass Bandbreite und Füllrate gespart werden. Um diesen Effekt zu optimieren, teilt PowerVR eine 3D-Szene in mehrere Tiles (Kacheln) auf und in jedem Tile wird dieser HSR-ähnliche Vorgang durchgeführt. Diese Technik ermöglicht es, Grafikchips der Konkurrenz mit eigentlich höherer Rechenleistung zu überflügeln.
| |
| | Andy Hearn
Posts 374 04 Oct 2018 21:22
| GLquake68k BVPPC permedia2:- 320x240@15bit 3.7fps 640x480@16bit 3.4fps all transparency effects and texture warping all fine, text all fine, no geometry glitches. CV64/3D Virge:- 320x240@15bit 4.9fps 640x480@15bit 1.5fps text rendring problems, texturing problems, light effects rendered as solid objects, transparency effects flat shaded, occasional vertex setup problems.Cow3D BVPPC permedia2:- 640x480@15bit 3fps CV64/3D Virge:- 640x480@15bit 1fps
| |
| | Louis Dias (Needs Verification) Posts 55/ 1 05 Oct 2018 14:51
| 3fps? I already clearly stated why 23Mhz cpu equipped consoles of the 90's could do 100k-200k polygons/second and have real Quake ports running at almost 30fps without the compromises listed above... Sega Saturn: EXTERNAL LINK My posts were deleted...as I'm sure this one will be...
| |
| | Gregthe Canuck
Posts 274 05 Oct 2018 23:34
| Hi Louis - If you follow the video you linked to you will hear they rewrote the engine that renders the game. The engine was custom-written for the machine and its custom hardware. That was the only way to get semi-decent performance (often down to 10fps). Other tricks involved adding extra walls in the levels to reduce the rendering load. I suggest you are comparing apples and oranges. To my understanding this thread (and related threads) are about how much the existing core and AMMX instructions can accomplish. This will help in various ways with future plans... helping to identify the areas most suited for attention/optimization. We are all on a journey... let's get there together without harsh words. Cheers!
| |
| | Louis Dias (Needs Verification) Posts 55/ 1 08 Oct 2018 23:30
| Most such games suffer frame drops..."down to" 10 fps from a 2 23mhz cpus at 320x200 thanks do a dedicated 3D chip is still much better than the video above. So like I said before, AMMX needs to be added to an existing or new "custom chip" to offload 3D from the cpu. that's not to say the cpu can't assist with it, just with game logic and input response, the cpu tends to be plenty busy already... ..and it's not like I'm saying "remove AMMX from the cpu" - I'm saying add it to a custom chip as well. I thought that 3D demo from a few months ago written for the 060 that used integer math to odd frames and fpu math to do the even ones was extremely clever... Imagine if you could do that with a few cores in a custom chip and the cpu...
| |
| | Markus B
Posts 209 09 Oct 2018 16:22
| I agree with Louis here. Although I'm not sure how you imagine a solution with a custom chip. But in general I like the idea to implement multiple AMMX units which can be used in parallel. Depends pretty much on the available space in the FPGA. It would avoid the need to think about SMP within AmigaOS, but make those AMMX units available as some sort of co-processors. Maybe Gunnar can answer how much space a AMMX unit would need and if they could be operated in parallel for video stuff.
| |
| | Louis Dias (Needs Verification) Posts 55/ 1 09 Oct 2018 17:47
| Personally, I think the main issue is having 2 memory controllers... One for "fast ram" to keep the cpu fed, and then the traditional chip/gpu ram for the custom chips so that those co-processors can process lists of vertices. By the way, the blitter, copper, gary, etc are 'custom chips' to me. ;)
| |
| | Stephan Hamers
Posts 22 09 Oct 2018 19:54
| Louis Dias wrote:
| By the way, the blitter, copper, gary, etc are 'custom chips' to me. ;)
|
You're wrong, gary is the pet from spongebob. But I think there has to be cleared something up first that is not very clear to me. For the 'akikio' mentioned earlier in other topics it was explained that the performance will be less to shit if it will be outsourced due to communication latency. Is there a 'golden rule' / calculation / recipe in which you can know when outsourcing will be beneficial? Or does every new idea require an entire overhaul of the design to be fully efficient? Maybe someone knows an interesting article/book so I can do my homework next time I ask something that maybe obvious? :) ANyhow, keep up the good work Apollo team and all programmers!!
| |
| | Andrew Miller
Posts 352 09 Oct 2018 20:53
| Deleted by myself: was a bit off topic
| |
| | Louis Dias (Needs Verification) Posts 55/ 1 09 Oct 2018 21:25
| @Stephan Hamers EXTERNAL LINK I mentioned AKIKO earlier since that's where the last "Commodore" Amiga put augmented video processing functions (aka C2P). Agnus/Alice is the memory controller and splits access to chip memory between the cpu and the chipset essentially only giving either half the bandwidth. FastRam goes directly to the cpu (only) and at full bus speed and can be accessed even when the chipset is accessing chip ram So having said that - that's why I questioned the unified memory format...and if a custom chip could execute 3D instructions in parallel, the cpu could be running other code from FastRAM while a 'custom chip' was transforming a stream of polygons in ChipRAM. I'm not sure what's happening inside a Vampire as far as fast vs chip ram so I'm asking questions and making suggestions... As for performance of the AKIKO in the CD32, it helped the EC020 with no fastram to C2P. Add fastram and the advantage is not so much. However, one that could process 1024 bytes instead of 4 bytes at a time would be much more efficient. (as in 256 times more efficient)... Such enchancements will not fit on the V2...
| |
| | Steve Ferrell
Posts 424 09 Oct 2018 21:36
| @Louis Dias Please stop confusing the AKIKO chip with anything 3D in nature. It was simply a glue logic chip for the CD32 and could also do chunky to planar bitmap conversions. It had nothing to do with 3D geometry or polygons. As Gunnar has stated on more than one occasion, the current Apollo Core can already perform these functions much more efficiently than an AKIKO chip and it would be redundant to add an FPGA AKIKO to the existing Apollo core.And why advocate the design and inclusion of a virtual and dedicated 3D GPU co-processor into the core? The glue logic alone to connect this virtual GPU to the rest of the system would make it impractical and much slower that just adding the same 3D functionality to the existing core.
| |
| | Markus B
Posts 209 09 Oct 2018 22:16
| I think he is referring to the Akiko as an example how parallel processing could speed things up.
| |
|