Information about the Apollo CPU and FPU. |
GCC and Other Stories From the Dark Side | page 1 2
|
---|
|
---|
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 12 Jan 2024 08:26
| We have discussed the benefit of 64bit memory operation. Let us also discuss the question about MOVEQ for 64bit Register.
Don Adan wrote:
| From my point of view much better if "moveq" implementation for 68080 will be works as 64 bits command, not only as 32 bits command.
|
I see what you want. While its always nice to look forward and wonder what can be made faster - at the same time there is also the aspect of compatibility. A very important feature of the Apollo 68080 CPU, is that its fully compatible to old 68K software. This means old software can be run as it and will just work normally. This backward compatible does include "stack" behavior. This means old programs using the normal 68k instructions will never trash the high part of 64bit registers. And they therefore also never need to save the full 64bit registers. Old program continue to save the 32bit parts of the registers and nothing more - just as they used todo. Doing it so compatible keeps also the stackframe of all software unchanged. I'm sure you agree that keeping compatibility even with stackframes is a very important feature. 100% Compatibility of the Apollo 68080 CPU is most important for us. Of course also improving coder friendliness, improving performance and improving code density are also important for us. If you like discuss such topics, please feel invited to join our developer discord channel. You can find their a nice group of experienced Amiga and Atari which with use brainstorm about such topics. The code density of the 68K family is excellent and the 68080 has of all 68k family members the best code density. Hope to see you soon in the chat. :-)
| |
| | Don Adan
Posts 38 12 Jan 2024 16:15
| Im too old and im too lazy. Anyway i dont see incompability problems with 64 bits "moveq" command with already exist 32 bits code. "Movem.l" store registers as 32 bits, "movem.w" as 16 bits. "movem.q" can store as 64 bits. It can be problem only with new 64 bits programs. Then I told that is perhaps too late for this change.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 12 Jan 2024 20:51
| Don Adan wrote:
| Im too old and im too lazy. Anyway i dont see incompability problems with 64 bits "moveq" command with already exist 32 bits code.
|
No problem I can explain you this. Amiga is a multitasking OS. This means you can have several programs running at same time. And in addition to this you can have drivers or other code reacting on Interrupts. The Registers of the CPU are used by all programs. Lets make an example. Lets say you have an code triggered by an IRQ. This handler now saves some registers with movem.l D0-D7.. This means the 32bit portion of the register are saved and later again stored. All very normal. When the program now executed MOVEQ - then this instruction should update the lower 32bit of the register - AND is not allowed to touch the high bits. As the program did also not save them. This means the behavior of the CPU is not allowed to change. And registers outside of the saved - are not allowed to be trashed. You see what I mean? I think its simple to understand.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 15 Jan 2024 11:18
| The 68K instruction set is famous for its code density. The 68080 has the most dense code of all 68K members. Lets show us why: Lets give examples for instructions: Setting a 32bit memory location to a certain value:
VALUE = Instruction used = Instruction Length IN Byte 0 = CLR.l (A0) = 2 byte -1 = MOV3.l #-1,(A0) = 2 byte +1 = MOV3.l #1,(A0) = 2 byte 4000 = MOVI.L #4000,(A0) = 4 byte -4000 = MOVI.L #-4000,(A0) = 4 byte and of course the "normal" instruction MOVE.L #imm,(A0) = 6 byte
What do you think? Do you know how many instruction Bytes for example PowerPC needs to do the same?
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 15 Jan 2024 13:46
| Lets compare setting a 32bit value in memory pointed to by register to a value POWERPC APOLLO 68080 -1 2 instruction, 8 BYTE 1 instruction, 2 BYTE 1 2 instruction, 8 BYTE 1 instruction, 2 BYTE 4000 2 instruction, 8 BYTE 1 instruction, 4 BYTE 1.000.000.000 3 instruction, 12 BYTE 1 instruction, 6 BYTE setting a 32bit value in memory NOT pointed to by register to a value POWERPC APOLLO 68080 -1 3 instruction, 12 BYTE 1 instruction, 6 BYTE 1 3 instruction, 12 BYTE 1 instruction, 6 BYTE 4000 3 instruction, 12 BYTE 1 instruction, 8 BYTE 1.000.000.000 4 instruction, 16 BYTE 1 instruction, 10 BYTE
What do you think?
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 16 Jan 2024 08:20
| meanwhile GCC is producing really nice code. Look at this: void memclr (int length, int * ptr) { for(;length--;){ *ptr++= 0; *ptr++= 0; *ptr++= 0; *ptr++= 0; } }
Compiled with -m68080 -O2 -fomit-frame-pointer _memclr: move.l (8,sp),a0 move.l (4,sp),d0 jra .L2 .L3: clr.q (a0)+ clr.q (a0)+ .L2: dbral d0,.L3 rts
I think this is very nice now. What do you think?
| |
| | Kamelito Loveless
Posts 261 16 Jan 2024 10:15
| Very nice, but the parameters could be passed using registers instead of the stack.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 16 Jan 2024 10:24
| Kamelito Loveless wrote:
| Very nice, but the parameters could be passed using registers instead of the stack. |
Yes you are right. Passing parameters in register is so much nicer. And this is the way (tm) we do it on Amiga. And GCC does also support this. If you use these compile parameters: -mregparm=2 -m68080 -O2
memclr: bra .L2 .L3: clr.q (a0)+ clr.q (a0)+ .L2: dbral d0,.L3 rts
Then you get this code. What do you think?
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 16 Jan 2024 12:17
| Gunnar von Boehn wrote:
| meanwhile GCC is producing really nice code. Look at this: void memclr (int length, int * ptr) { for(;length--;){ *ptr++= 0; *ptr++= 0; *ptr++= 0; *ptr++= 0; } }
Compiled with -m68080 -O2 -fomit-frame-pointer _memclr: move.l (8,sp),a0 move.l (4,sp),d0 jra .L2 .L3: clr.q (a0)+ clr.q (a0)+ .L2: dbral d0,.L3 rts
I think this is very nice now. What do you think?
|
I see one more possible improvement: _memclr: move2.l (4,sp),D0:A0 bra .L2 .L3: clr.q (a0)+ clr.q (a0)+ .L2: dbral d0,.L3 rts
I think it would be nice to teach this to GCC.
| |
| | Kamelito Loveless
Posts 261 16 Jan 2024 19:52
| It is a good compromise but passing parameters directly via the registers is still better..
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 16 Jan 2024 19:59
| Kamelito Loveless wrote:
| It is a good compromise but passing parameters directly via the registers is still better..
|
Yes true. For calling functions using regparm makes a lot sense in my opinion. Of course there are also cases were program work with structures, have to parse them, or prepare them. And this often includes putting a number of values often in registers into structures, or reading the structure into a number of register. The MOVE2 instruction is great to make this faster. Having GCC support it - will be a win. :-)
| |
| | Kamelito Loveless
Posts 261 17 Jan 2024 12:42
| Yep improving gcc is crucial thanks for the work put into it.
| |
| | Don Adan
Posts 38 17 Jan 2024 14:32
| Gunnar von Boehn wrote:
|
Don Adan wrote:
| Im too old and im too lazy. Anyway i dont see incompability problems with 64 bits "moveq" command with already exist 32 bits code. |
No problem I can explain you this. Amiga is a multitasking OS. This means you can have several programs running at same time. And in addition to this you can have drivers or other code reacting on Interrupts. The Registers of the CPU are used by all programs. Lets make an example. Lets say you have an code triggered by an IRQ. This handler now saves some registers with movem.l D0-D7.. This means the 32bit portion of the register are saved and later again stored. All very normal. When the program now executed MOVEQ - then this instruction should update the lower 32bit of the register - AND is not allowed to touch the high bits. As the program did also not save them. This means the behavior of the CPU is not allowed to change. And registers outside of the saved - are not allowed to be trashed. You see what I mean? I think its simple to understand.
|
Yes, but for 32 bits coding, higher longword (high 32 bits) is totally unused. Then is not important if this 0 or -1 via moveq. I dont see/know 32 bits 68k instruction which can be problematic. Unused for me is equal can be ignored. But for 64 bits coding, 64 bits moveq can be really powerful. Much powerful for me like clr.q, f.e moveq #0,D0 ; 64 bits moveq #-1,D1 ; 64 bits movem.q D0-D1,(A0)+ or moveq #-1,D0 clr.l D0 we have $FFFFFFFF00000000 64 value in only 4 bytes.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 17 Jan 2024 15:17
| Don Adan wrote:
| Yes, but for 32 bits coding, higher longword (high 32 bits) is totally unused. Then is not important if this 0 or -1 via moveq.
|
Actually this is very important, I can explain you why. The CPU in an Amiga is used by the Operating system, and the Amiga can run multible programs at the same time. We call this multitasking. How does this work that programs run at the same time on one CPU? This works like this: 1) The CPU executes instructions of the current program. One instruction after the other .. And the program uses the registers of the CPU. If a running program is a 32bit program then it might only use the 32bit lower part of all the registers. If the program is doing 64bit operations then it will use the full registers all 64bits. Lets say you run a new program using 64bit. For example RIVA the video player. RIVA uses a lot 64bit operations. Lets say while you play video...the Super-AGA network chip receives a new packet, the Super-AGA chip will DMA this packet (for free) to the memory, and when done it will throw an IRQ to signal the network software driver that a new packet was received. The software driver will be called to process it. This is called "Interrupt".. The software driver might be 32bit code.. It will use MOVEM.L to saved the content of the register... do some work.. maybe send a signal to a task e.g. tell the Webbrowser that a packet for it was received... and then it will restore the registers again and "RTE" will return from the exception. So that the CPU will continue to execute the where it was before. In this case it will continue to run RIVA the video player. All this is done very fast, the user will not notice this quick interrupt switch. I hope my example was easy to follow? Important to understand is that the 32bit interrupt code has saved the 32bit part of the register that it uses... And the 32bit task with its instruction that it uses ... its also only allowed to touch the lower 32bit part. This is very important. As otherwise the 32bit code would accidentally trash the high part of the register - which would crash RIVA - as the high parts were never saved and never restored. So yes it makes a lot sense that MOVEQ behaves for 32bit programs exactly as always worked - touching 32bit.
| |
| | Gunnar von Boehn (Apollo Team Member) Posts 6254 19 Jan 2024 15:55
| Good news! GCC now support MOVE2.L !!
| |
| | Kamelito Loveless
Posts 261 20 Jan 2024 19:24
| Great, 2024 will be the year of GCC 080, so faster programs and OS ahead!
| |
|
|
|