Overview Features Instructions Performance Forum Downloads Products Reseller Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
VISIT APOLLO IRC CHANNEL



All TopicsNewsPerformanceGamesDemosApolloVampireCoffinReleases
Performance and Benchmark Results!

ASM 080 Tricks

Philippe Flype

Posts 220
04 Sep 2018 23:20


We were playing with some tricks that could easily apply on the 080.

One that come to mind is using FDIV, which is very fast, instead of DIV. This way we obtain a VERY fast division.

First, have a look at the DIV & FDIV scores in MiniBench.

EXTERNAL LINK 

Then let's try something that DEMOMAKERS would like use i guess, having fast FPU in mind :


   
    MACHINE MC68040
   
    N EQU 100000
   
    MAIN:
   
    dc.w  $4e7a,$7809 ; MOVEC SPR_CLK,d7
   
   
    move.l  #N,d6
    .loop
    REPT 8
      ;
  ; DIV - GOOD OLD INTEGER WAY
  ;
  *  move.l  #12345,d0
  *  move.l  #3,d1
  *  divu.l  d1,d0      ; result in D0 ($00001013)
      ;
  ; DIV - USING FLOAT TRICK
  ;
      move.l  #12345,d0
      move.l  #3,d1
      fmove.l d0,fp0
      fdiv.l  d1,fp0
      fmove.l fp0,d0      ; result in D0 ($00001013)
      ;
    ENDR
    subq.l  #1,d6
    bne.w    .loop
   
   
    dc.w  $4e7a,$0809  ; MOVEC SPR_CLK,d0
    sub.l  d7,d0        ; CLOCK2 - CLOCK1
    divu.l #(N*8),d0
   
    rts

The INTEGER move/move/div takes 36 cycles.
The FLOAT move/move/fmove/fdiv/fmove takes 16 cycles.

This "naive" test is yet 2.5 times better.
Can be even much faster since the FPU is fully pipelines and OoO.
One can computes matrix with some operations for free in parallel.




Philippe Flype

Posts 220
04 Sep 2018 23:54


As a MACRO,

Not always optimal, and yet still much faster than DIV.

  MACHINE MC68040

IDIVSL MACRO
  fmove.l  /2,fp0
  fdiv.l  /1,fp0
  fmove.l  fp0,/2
  ENDM

TEST:
 
  move.l  #12345,d0
  move.l  #3,d1
  divs.l  d1,d0
  ILLEGAL  ; D0 = $00001013
 
  move.l  #12345,d0
  move.l  #3,d1
  IDIVSL  d1,d0
  ILLEGAL  ; D0 = $00001013
 
  rts





Renee Cousins

Posts 62
05 Sep 2018 00:25


Preliminary unsigned version (needs testing).

    ; given d0 and d1 are 0 to FFFFFFFF

    bchg.l  #31, d0        ; convert unsigned to signed    -80000000 to +7FFFFFFF
    fmove.l d0,fp0          ; convert signed to float      -2147483648.0 to 2147483647.0
    fadd.l  #2147483648,fp0 ; convert back into unsigned    0.0 to 4294967295.0
   
    bchg.l  #31,d1          ; convert unsigned to signed    -80000000 to +7FFFFFFF
    fmove.l d1,fp0          ; convert signed to float      -2147483648.0 to 2147483647.0
    fadd.l  #2147483648,fp1 ; convert back into unsigned    0.0 to 4294967295.0
   
    fdiv.l  fp1,fp0        ; perform 'unsigned' division  >= 0.0
   
    fsub.l  #2147483648,fp1 ; convert unsigned to signed    -2147483648.0 to 2147483647.0
    fmove.l fp0,d0          ; convert signed to integer    -80000000 to +7FFFFFFF
    bchg.l  #31, d0        ; convert back into unsigned    0 to FFFFFFFF
    bchg.l  #31, d1        ; convert back into unsigned    0 to FFFFFFFF




Samuel Devulder

Posts 135
22 Sep 2018 12:40


Renee Cousins wrote:

Preliminary unsigned version (needs testing).
 

      ; given d0 and d1 are 0 to FFFFFFFF
 
      bchg.l  #31, d0        ; convert unsigned to signed    -80000000 to +7FFFFFFF

I think it is a mistake. Bchg #31 on -$80000000 doesn't create $7FFFFFFF but $00000000. Or am I missing something?


Renee Cousins

Posts 62
22 Sep 2018 16:16


Samuel Devulder wrote:

Renee Cousins wrote:

  Preliminary unsigned version (needs testing).
 

      ; given d0 and d1 are 0 to FFFFFFFF
 
      bchg.l  #31, d0        ; convert unsigned to signed    -80000000 to +7FFFFFFF
 

  I think it is a mistake. Bchg #31 on -$80000000 doesn't create $7FFFFFFF but $00000000. Or am I missing something?

That was a range, not this into that.


Samuel Devulder

Posts 135
22 Sep 2018 18:25


Oh I see! You remove #2147483648 with bchg (sub.l would also be possible), and then you add 2147483648.0 on the float later. This is fine indeed. Sorry for the trouble. ;)


Renee Cousins

Posts 62
23 Sep 2018 06:19


Samuel Devulder wrote:

Oh I see! You remove #2147483648 with bchg (sub.l would also be possible), and then you add 2147483648.0 on the float later. This is fine indeed. Sorry for the trouble. ;)

Yes sub or add are equivalent to bchg on the highest bit.

posts 7