Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases
Information about the Apollo CPU and FPU.

GCC and Other Stories From the Dark Sidepage  1 2 

Gunnar von Boehn
(Apollo Team Member)
Posts 6223
12 Jan 2024 08:26


We have discussed the benefit of 64bit memory operation.
Let us also discuss the question about MOVEQ for 64bit Register.
   
Don Adan wrote:

From my point of view much better if "moveq" implementation for 68080 will be works as 64 bits command, not only as 32 bits command.

 
I see what you want.
While its always nice to look forward and wonder what can be made faster - at the same time there is also the aspect of compatibility.
   
A very important feature of the Apollo 68080 CPU, is that its fully compatible to old 68K software.
This means old software can be run as it and will just work normally. This backward compatible does include "stack" behavior. This means old programs using the normal 68k instructions will never trash the high part of 64bit registers. And they therefore also never need to save the full 64bit registers.  Old program continue to save the 32bit parts of the registers and nothing more - just as they used todo.
Doing it so compatible keeps also the stackframe of all software unchanged.
   
I'm sure you agree that keeping compatibility even with stackframes is a very important feature.
 
 
100% Compatibility of the Apollo 68080 CPU is most important for us.
Of course also improving coder friendliness, improving performance and improving code density are also important for us.
 
If you like discuss such topics, please feel invited to join our developer discord channel. You can find their a nice group of experienced Amiga and Atari which with use brainstorm about such topics.
 
The code density of the 68K family is excellent and the 68080 has of all 68k family members the best code density.
Hope to see you soon in the chat. :-)


Don Adan

Posts 38
12 Jan 2024 16:15


Im too old and im too lazy. Anyway i dont see incompability problems with 64 bits "moveq" command with already exist 32 bits code. "Movem.l" store registers as 32 bits, "movem.w" as 16 bits. "movem.q" can store as 64 bits. It can be problem only with new 64 bits programs. Then I told that is perhaps too late for this change.


Gunnar von Boehn
(Apollo Team Member)
Posts 6223
12 Jan 2024 20:51


Don Adan wrote:

Im too old and im too lazy. Anyway i dont see incompability problems with 64 bits "moveq" command with already exist 32 bits code.

No problem I can explain you this.
Amiga is a multitasking OS.
This means you can have several programs running at same time.
And in addition to this you can have drivers or other code reacting on Interrupts.

The Registers of the CPU are used by all programs.

Lets make an example.
Lets say you have an code triggered by an IRQ.
This handler now saves some registers with movem.l D0-D7..
This means the 32bit portion of the register are saved and later again stored. All very normal.

When the program now executed MOVEQ - then this instruction should update the lower 32bit of the register - AND is not allowed to touch the high bits. As the program did also not save them.

This means the behavior of the CPU is not allowed to change.
And registers outside of the saved - are not allowed to be trashed.

You see what I mean? I think its simple to understand.




Gunnar von Boehn
(Apollo Team Member)
Posts 6223
15 Jan 2024 11:18


The 68K instruction set is famous for its code density.
 
The 68080 has the most dense code of all 68K members.
 
Lets show us why:
 
 
Lets give examples for instructions:
Setting a 32bit memory location to a certain value:
 

VALUE    = Instruction used  = Instruction Length IN Byte
  0        = CLR.l (A0)        =  2 byte
  -1      = MOV3.l #-1,(A0)    =  2 byte
  +1      = MOV3.l #1,(A0)    =  2 byte
  4000    = MOVI.L #4000,(A0)  =  4 byte
  -4000    = MOVI.L #-4000,(A0) =  4 byte
  and of course the "normal" instruction
            MOVE.L #imm,(A0)  =  6 byte 

What do you think?

Do you know how many instruction Bytes for example PowerPC needs to do the same?


Gunnar von Boehn
(Apollo Team Member)
Posts 6223
15 Jan 2024 13:46


Lets compare
 
  setting a 32bit value in memory pointed to by register to a value
 
 

              POWERPC                  APOLLO 68080
  -1            2 instruction, 8 BYTE    1 instruction, 2 BYTE
  1            2 instruction, 8 BYTE    1 instruction, 2 BYTE
  4000          2 instruction, 8 BYTE    1 instruction, 4 BYTE
  1.000.000.000  3 instruction, 12 BYTE    1 instruction, 6 BYTE
 
 
  setting a 32bit value in memory NOT pointed to by register to a value
 
              POWERPC                  APOLLO 68080
  -1            3 instruction, 12 BYTE    1 instruction, 6 BYTE
  1            3 instruction, 12 BYTE    1 instruction, 6 BYTE
  4000          3 instruction, 12 BYTE    1 instruction, 8 BYTE
  1.000.000.000  4 instruction, 16 BYTE    1 instruction, 10 BYTE
 

 
 
  What do you think?


Gunnar von Boehn
(Apollo Team Member)
Posts 6223
16 Jan 2024 08:20


meanwhile GCC is producing really nice code.
Look at this:


void memclr (int length, int * ptr)
{
  for(;length--;){
    *ptr++= 0;
    *ptr++= 0;
    *ptr++= 0;
    *ptr++= 0;
  }
}

Compiled with -m68080 -O2 -fomit-frame-pointer


_memclr:
        move.l (8,sp),a0
        move.l (4,sp),d0
        jra .L2
.L3:
        clr.q (a0)+
        clr.q (a0)+
.L2:
        dbral d0,.L3
        rts

I think this is very nice now.
What do you think?


Kamelito Loveless

Posts 260
16 Jan 2024 10:15


Very nice, but the parameters could be passed using registers instead of the stack.


Gunnar von Boehn
(Apollo Team Member)
Posts 6223
16 Jan 2024 10:24


Kamelito Loveless wrote:

  Very nice, but the parameters could be passed using registers instead of the stack.
 

 
Yes you are right.
Passing parameters in register is so much nicer.
And this is the way (tm) we do it on Amiga.
 
 
And GCC does also support this.
If you use these compile parameters: -mregparm=2 -m68080 -O2

memclr:
          bra .L2
.L3:
          clr.q (a0)+
          clr.q (a0)+
.L2:
          dbral d0,.L3
          rts

 
Then you get this code.
What do you think?


Gunnar von Boehn
(Apollo Team Member)
Posts 6223
16 Jan 2024 12:17


Gunnar von Boehn wrote:

meanwhile GCC is producing really nice code.
  Look at this:
 
 

  void memclr (int length, int * ptr)
  {
  for(;length--;){
    *ptr++= 0;
    *ptr++= 0;
    *ptr++= 0;
    *ptr++= 0;
    }
  }
 

 
  Compiled with -m68080 -O2 -fomit-frame-pointer
 
 

  _memclr:
          move.l (8,sp),a0
          move.l (4,sp),d0
          jra .L2
  .L3:
          clr.q (a0)+
          clr.q (a0)+
  .L2:
          dbral d0,.L3
          rts
 

 
  I think this is very nice now.
  What do you think?

I see one more possible improvement:

 


  _memclr:
          move2.l (4,sp),D0:A0
          bra .L2
  .L3:
          clr.q (a0)+
          clr.q (a0)+
  .L2:
          dbral d0,.L3
          rts
 

 
I think it would be nice to teach this to GCC.


Kamelito Loveless

Posts 260
16 Jan 2024 19:52


It is a good compromise but passing parameters directly via the registers is still better..


Gunnar von Boehn
(Apollo Team Member)
Posts 6223
16 Jan 2024 19:59


Kamelito Loveless wrote:

It is a good compromise but passing parameters directly via the registers is still better..

 
Yes true.
For calling functions using regparm makes a lot sense in my opinion.
 
 
Of course there are also cases were program work with structures, have to parse them, or prepare them. And this often includes putting  a number of values often in registers into structures, or reading the structure into a number of register.
 
The MOVE2 instruction is great to make this faster.
Having GCC support it - will be a win. :-)


Kamelito Loveless

Posts 260
17 Jan 2024 12:42


Yep improving gcc is crucial thanks for the work put into it.


Don Adan

Posts 38
17 Jan 2024 14:32


Gunnar von Boehn wrote:

Don Adan wrote:

  Im too old and im too lazy. Anyway i dont see incompability problems with 64 bits "moveq" command with already exist 32 bits code.
 

 
  No problem I can explain you this.
  Amiga is a multitasking OS.
  This means you can have several programs running at same time.
  And in addition to this you can have drivers or other code reacting on Interrupts.
 
  The Registers of the CPU are used by all programs.
 
  Lets make an example.
  Lets say you have an code triggered by an IRQ.
  This handler now saves some registers with movem.l D0-D7..
  This means the 32bit portion of the register are saved and later again stored. All very normal.
 
  When the program now executed MOVEQ - then this instruction should update the lower 32bit of the register - AND is not allowed to touch the high bits. As the program did also not save them.
 
  This means the behavior of the CPU is not allowed to change.
  And registers outside of the saved - are not allowed to be trashed.
 
  You see what I mean? I think its simple to understand.
 
 

Yes, but for 32 bits coding, higher longword (high 32 bits) is totally unused. Then is not important if this 0 or -1 via moveq. I dont see/know 32 bits 68k instruction which can be problematic. Unused for me is equal can be ignored.

But for 64 bits coding, 64 bits moveq can be really powerful. Much powerful for me like clr.q, f.e

 

moveq #0,D0 ; 64 bits
moveq #-1,D1 ; 64 bits
movem.q D0-D1,(A0)+

or
  moveq #-1,D0
  clr.l D0
we have $FFFFFFFF00000000 64 value in only 4 bytes.




Gunnar von Boehn
(Apollo Team Member)
Posts 6223
17 Jan 2024 15:17


Don Adan wrote:

  Yes, but for 32 bits coding, higher longword (high 32 bits) is totally unused. Then is not important if this 0 or -1 via moveq. 

 
Actually this is very important, I can explain you why.
 

The CPU in an Amiga is used by the Operating system,
and the Amiga can run multible programs at the same time.
We call this multitasking.
How does this work that programs run at the same time on one CPU?
 
This works like this:
1)
The CPU executes instructions of the current program.
One instruction after the other .. And the program uses the registers of the CPU.

If a running  program is a 32bit program then it might only use the 32bit lower part of all the registers. If the program is doing 64bit operations then it will use the full registers all 64bits.
 

Lets say you run a new program using 64bit.
For example RIVA the video player.
RIVA uses a lot 64bit operations.
 
Lets say while you play video...the Super-AGA network chip receives a new packet, the Super-AGA chip will DMA this packet (for free) to the memory, and when done it will throw an IRQ to signal the network software driver that a new packet was received.
 
The software driver will be called to process it.
This is called "Interrupt"..
The software driver might be 32bit code..
It will use MOVEM.L to saved the content of the register... do some work.. maybe send a signal to a task e.g. tell the Webbrowser that a packet for it was received... and then it will restore the registers again and "RTE" will return from the exception.
So that the CPU will continue to execute the where it was before.
 
In this case it will continue to run RIVA the video player.
 
All this is done very fast, the user will not notice this quick interrupt switch. I hope my example was easy to follow?
 
Important to understand is that the 32bit interrupt code has saved the 32bit part of the register that it uses...
And the 32bit task with its instruction that it uses ... its also only allowed to touch the lower 32bit part.
 
This is very important.
As otherwise the 32bit code would accidentally trash the high part of the register - which would crash RIVA - as the high parts were never saved and never restored.
 

So yes it makes a lot sense that MOVEQ behaves for 32bit programs exactly as always worked - touching 32bit.




Gunnar von Boehn
(Apollo Team Member)
Posts 6223
19 Jan 2024 15:55


Good news!

GCC now support MOVE2.L !!



Kamelito Loveless

Posts 260
20 Jan 2024 19:24


Great, 2024 will be the year of GCC 080, so faster programs and OS ahead!

posts 36page  1 2