Overview Features Instructions Performance Forum Downloads Products Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
VISIT APOLLO IRC CHANNEL



All TopicsNewsPerformanceGamesDemosApolloVampireCoffinReleasesLogin
Information about the Apollo CPU and FPU.

Optimal 68K Memcopy Routinepage  1 2 

Gunnar von Boehn
(Apollo Team Member)
Posts 4182
27 Jun 2017 06:00


Marcus Sackrow wrote:

How this 128 64 bit registers are managed inside AmigaOS? I guess not at all on task switch only the 16 32 bit register are saved and restored. So currently you can't use them in a normal current AmigaOS? Am I right?

The 128 registers are in fact two Cores/2 HyperThreads each 64 architectural registers.
Of the architectural 64 registers, 48 registers are user accessible.

APOLLOs Register Window looks like this:


16 ADDRESS Registers
  8 Data Registers
24 MMX Registers

AMIGA OS will per default per task, save and restore the 16 old user registers / and only the lower 32bit portion of them.
This means the stack frame of old program stays the same.

New programs using the new 64 operations will automatically switch to enhanced 64-mode, and their task will save the user accessible register. AMIGA OS being a 32bit OS does always only use the 32bit part of the ADDRESS pointers and will always only save/restore this.

Marcus Sackrow wrote:

Here an optimized vampire enabled AROS could give a big performance boost, when using the huge amount of registers ;)

Yes, you are absolute correct here.
All programs benefit from having more registers.
But especially typically data decoding algorithms for which AMMX is good to use, benefit greatly from having more registers.
 
AMMX instruction set is very strong.
* it can operate on memory
* it supports #immediates
* its 3 Operant in 1 instruction
* it can use 32 Registers
* it supports misalign memory operants for free

If you compare this with "others" then you see

INTEL MMX
- Intel MMX only has 8 Register
- Intel MMX was only 2 operants

POWERPC ALTIVEC
- does not support memory operants
- does not support immediate
- does not support misaligned memory address


Marcus Sackrow

Posts 37
27 Jun 2017 07:31


Gunnar von Boehn wrote:
 
  No, the memcopy routine does not need to save a register on the stack. The 68K programming ABI defines D0/D1/A0/A1 as scratch registers for routines. Using these 4 registers in a subroutine is defined by MOTOROLA as the way of doing it.
  Therefore this memcopy does NOT need to SAVE/RESTORE any registers on the stack.

I know that, but my routine maybe use D1 ;) for something else before and after the memcopy routine. Register are short and sacred :-D

Gunnar von Boehn wrote:
 
  Marcus, if you want to understand what AMMX is, then you should not focus on this micro memcopy example.
...
AMMX is not limited to do simply memcopy.
  AMMX is designed to process a lot of data.
  Processing means doing many multiplications or many additions in short time. In other words to do real workloads.

Believe me, I'm fully aware what vectorized operations are capable of. It's part of my work to write scientific calculation routines in SSE2/SSE3 (and lately AVX/AVX2 as well) in Assembler, sometimes even write directly opcodes to memory for high dynamic subroutines (not self modifying code, that does not work well on todays CPUs ;-), just runtime compiling/assembling). (but I do not have much use for Integers :-P I always need Single/Double MOVAPS, ADDPS, SUBPS, MULPS, DIVPS ftw. :D )

Gunnar von Boehn wrote:
 
AMIGA OS will per default per task, save and restore the 16 old user registers / and only the lower 32bit portion of them.

I guessed so, I don't know the source of RIVA with AMMX extension, but isn't that a "hot case" then? starting a second program also using AMMX registers will destroy each others register (? :-O) because they are not saved/restored properly? Or that just operates on the 32bit part of the well known (as you said "old") register?

Sorry for the questions, but the documentation layer about that stuff is very thin, at least what I found.


Henryk Richter
(Apollo Team Member)
Posts 105
27 Jun 2017 07:44


Marcus Sackrow wrote:

  I guessed so, I don't know the source of RIVA with AMMX extension, but isn't that a "hot case" then? starting a second program also using AMMX registers will destroy each others register (? :-O) because they are not saved/restored properly? Or that just operates on the 32bit part of the well known (as you said "old") register?
 
  Sorry for the questions, but the documentation layer about that stuff is very thin, at least what I found.

You're spot-on. That's why we've got some exec patches in place which will properly save/restore the extended registers upon context switches. In order to maintain backwards compatible stack frames for existing applications, the extended stack frames are written conditionally.

Bit 11 in SR triggers the extended stack frame logic. At the moment, AMMX applications set this bit manually. We've got a library in the works which is supposed to provide the respective functionality in a nicer fashion.



Gunnar von Boehn
(Apollo Team Member)
Posts 4182
27 Jun 2017 07:48


Marcus Sackrow wrote:

I guessed so, I don't know the source of RIVA with AMMX extension, but isn't that a "hot case" then?

This works fine don't worry.
Btw doing "optimized" register saving like this is today "state of the art". IBM and others do it the same way.


Marcus Sackrow

Posts 37
01 Jul 2017 09:59


I tried to activate the AMMX feature and test this copy routine but it seems the AMMX activation did not work. at least the copy routine does nothing and the mouse pointer stuck when the loop is running (so I guess the LOAD/STOREC producing traps because AMMX is not enabled)

the code I use


; VampireRoutines.s
; compile with: (vasm 1.8+)
; vasmm68k_std -Fhunk -m68080 -o VampireRoutines.o VampireRoutines.s
;
; Routines to use the Vampire accelerator card in FreePascal
;
; procedure CopyMemVampire(Src, Dest, Len);
.globl CopyMemVampire
CopyMemVampire:
LOOP:
      LOAD  (A0)+,D1      ; load 8 bytes
      STOREC D0,D1,(A1)+    ; store 8 bytes but never more than count in D0
      SUBQ.L  #8,D0
      BHI.B  LOOP
      RTS
;
; called by Initialization to activate AMMX extension
; procedure EnableAMMX();
.globl EnableAMMX
EnableAMMX:
    move.l  4.w,a6
    jsr -120(a6)  ; Disable
    jsr -150(a6)  ; SuperState
    move  sr,d0    ; Get SR
    or.w  #2048,d0 ; set Bit 11 -> AMMX enabled
    move  d0,sr    ; write SR back
    jsr -156(a6)  ; UserState
    jsr -126(a6)  ; Enable
    rts

I checked already the SR after the EnableAMMX call is $2800 so it seems the SR bit is set as it should be.
I also checked that FreePascal put the SRC, DEST and LEN into the right register, for the copy call (and if I replace it with move.l (a0)+, (A1)+ and subql.l  #4,D0 it works, of course with a Len can be divided by 4).
Any idea what else could be the problem?




Marcus Sackrow

Posts 37
01 Jul 2017 10:18


ahh, forget it I saw it, it needs Gold3 to work...  that explains it


Gunnar von Boehn
(Apollo Team Member)
Posts 4182
01 Jul 2017 12:13


Marcus Sackrow wrote:

I tried to activate the AMMX feature and test this copy routine 

Hi Marcus,

if you just want to memcopy, then the normal AMIGA routine is ok.
There is no real need to exchange it.
APOLLO will saturate all the memory bandwidth with it.

The AROS code was example of over-bloated code with 250 lines for a memcopy. So here it maked sense to show that it can be done smaller.



Stefan "Bebbo" Franke

Posts 126
24 Jun 2019 10:33


off topic:
Gunnar von Boehn wrote:

  Patching GNU ASM is easy.
  We added support for APOLLO instructions to it already.

where can I get this patch?




Gunnar von Boehn
(Apollo Team Member)
Posts 4182
24 Jun 2019 11:24


Stefan "Bebbo" Franke wrote:

off topic:
 
Gunnar von Boehn wrote:

  Patching GNU ASM is easy.
  We added support for APOLLO instructions to it already.
 

 
  where can I get this patch?
 
 

Hallo Bebbo,

regarding improving GCC how about we open a new topic for that?


posts 29page  1 2