APOLLO CPU Knowledge Forum

Overview

Features

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.

All Topics

News

Performance

Games

Demos

Apollo

Vampire

AROS

Workbench

ATARI

Releases

Performance and Benchmark Results!

ASM 080 Tricks

Philippe Flype
(Apollo Team Member)
Posts 299
04 Sep 2018 23:20

We were playing with some tricks that could easily apply on the 080.

One that come to mind is using FDIV, which is very fast, instead of DIV. This way we obtain a VERY fast division.

First, have a look at the DIV & FDIV scores in MiniBench.

EXTERNAL LINK

Then let's try something that DEMOMAKERS would like use i guess, having fast FPU in mind :

MACHINE MC68040

N EQU 100000

MAIN:

dc.w $4e7a,$7809 ; MOVEC SPR_CLK,d7

move.l #N,d6
.loop
REPT 8
;
; DIV - GOOD OLD INTEGER WAY
;
* move.l #12345,d0
* move.l #3,d1
* divu.l d1,d0 ; result in D0 ($00001013)
;
; DIV - USING FLOAT TRICK
;
move.l #12345,d0
move.l #3,d1
fmove.l d0,fp0
fdiv.l d1,fp0
fmove.l fp0,d0 ; result in D0 ($00001013)
;
ENDR
subq.l #1,d6
bne.w .loop

dc.w $4e7a,$0809 ; MOVEC SPR_CLK,d0
sub.l d7,d0 ; CLOCK2 - CLOCK1
divu.l #(N*8),d0

rts

The INTEGER move/move/div takes 36 cycles.
The FLOAT move/move/fmove/fdiv/fmove takes 16 cycles.

This "naive" test is yet 2.5 times better.
Can be even much faster since the FPU is fully pipelines and OoO.
One can computes matrix with some operations for free in parallel.

Philippe Flype
(Apollo Team Member)
Posts 299
04 Sep 2018 23:54

As a MACRO,

Not always optimal, and yet still much faster than DIV.

MACHINE MC68040

IDIVSL MACRO
fmove.l /2,fp0
fdiv.l /1,fp0
fmove.l fp0,/2
ENDM

TEST:

move.l #12345,d0
move.l #3,d1
divs.l d1,d0
ILLEGAL ; D0 = $00001013

move.l #12345,d0
move.l #3,d1
IDIVSL d1,d0
ILLEGAL ; D0 = $00001013

rts

Renee Cousins
(Apollo Team Member)
Posts 142
05 Sep 2018 00:25

Preliminary unsigned version (needs testing).

; given d0 and d1 are 0 to FFFFFFFF

bchg.l #31, d0 ; convert unsigned to signed -80000000 to +7FFFFFFF
fmove.l d0,fp0 ; convert signed to float -2147483648.0 to 2147483647.0
fadd.l #2147483648,fp0 ; convert back into unsigned 0.0 to 4294967295.0

bchg.l #31,d1 ; convert unsigned to signed -80000000 to +7FFFFFFF
fmove.l d1,fp0 ; convert signed to float -2147483648.0 to 2147483647.0
fadd.l #2147483648,fp1 ; convert back into unsigned 0.0 to 4294967295.0

fdiv.l fp1,fp0 ; perform 'unsigned' division >= 0.0

fsub.l #2147483648,fp1 ; convert unsigned to signed -2147483648.0 to 2147483647.0
fmove.l fp0,d0 ; convert signed to integer -80000000 to +7FFFFFFF
bchg.l #31, d0 ; convert back into unsigned 0 to FFFFFFFF
bchg.l #31, d1 ; convert back into unsigned 0 to FFFFFFFF

Samuel Devulder

Posts 248
22 Sep 2018 12:40

Renee Cousins wrote:

Preliminary unsigned version (needs testing).


      ; given d0 and d1 are 0 to FFFFFFFF
  
      bchg.l  #31, d0         ; convert unsigned to signed    -80000000 to +7FFFFFFF

I think it is a mistake. Bchg #31 on -$80000000 doesn't create $7FFFFFFF but $00000000. Or am I missing something?

Renee Cousins
(Apollo Team Member)
Posts 142
22 Sep 2018 16:16

Samuel Devulder wrote:

Renee Cousins wrote:

Preliminary unsigned version (needs testing).


       ; given d0 and d1 are 0 to FFFFFFFF
   
       bchg.l  #31, d0         ; convert unsigned to signed    -80000000 to +7FFFFFFF

I think it is a mistake. Bchg #31 on -$80000000 doesn't create $7FFFFFFF but $00000000. Or am I missing something?

That was a range, not this into that.


Samuel Devulder Posts 248 22 Sep 2018 18:25	Oh I see! You remove #2147483648 with bchg (sub.l would also be possible), and then you add 2147483648.0 on the float later. This is fine indeed. Sorry for the trouble. ;)

Renee Cousins
(Apollo Team Member)
Posts 142
23 Sep 2018 06:19

Samuel Devulder wrote:

Oh I see! You remove #2147483648 with bchg (sub.l would also be possible), and then you add 2147483648.0 on the float later. This is fine indeed. Sorry for the trouble. ;)

Yes sub or add are equivalent to bchg on the highest bit.

posts 7