Performance and Benchmark Results!
|
|
---|
| | Philippe Flype (Apollo Team Member) Posts 299 04 Sep 2018 23:20
| We were playing with some tricks that could easily apply on the 080. One that come to mind is using FDIV, which is very fast, instead of DIV. This way we obtain a VERY fast division. First, have a look at the DIV & FDIV scores in MiniBench. EXTERNAL LINK Then let's try something that DEMOMAKERS would like use i guess, having fast FPU in mind : MACHINE MC68040 N EQU 100000 MAIN: dc.w $4e7a,$7809 ; MOVEC SPR_CLK,d7 move.l #N,d6 .loop REPT 8 ; ; DIV - GOOD OLD INTEGER WAY ; * move.l #12345,d0 * move.l #3,d1 * divu.l d1,d0 ; result in D0 ($00001013) ; ; DIV - USING FLOAT TRICK ; move.l #12345,d0 move.l #3,d1 fmove.l d0,fp0 fdiv.l d1,fp0 fmove.l fp0,d0 ; result in D0 ($00001013) ; ENDR subq.l #1,d6 bne.w .loop dc.w $4e7a,$0809 ; MOVEC SPR_CLK,d0 sub.l d7,d0 ; CLOCK2 - CLOCK1 divu.l #(N*8),d0 rts
The INTEGER move/move/div takes 36 cycles. The FLOAT move/move/fmove/fdiv/fmove takes 16 cycles. This "naive" test is yet 2.5 times better. Can be even much faster since the FPU is fully pipelines and OoO. One can computes matrix with some operations for free in parallel.
| |
| | Philippe Flype (Apollo Team Member) Posts 299 04 Sep 2018 23:54
| As a MACRO, Not always optimal, and yet still much faster than DIV. MACHINE MC68040 IDIVSL MACRO fmove.l /2,fp0 fdiv.l /1,fp0 fmove.l fp0,/2 ENDM TEST: move.l #12345,d0 move.l #3,d1 divs.l d1,d0 ILLEGAL ; D0 = $00001013 move.l #12345,d0 move.l #3,d1 IDIVSL d1,d0 ILLEGAL ; D0 = $00001013 rts
| |
| | Renee Cousins (Apollo Team Member) Posts 142 05 Sep 2018 00:25
| Preliminary unsigned version (needs testing).
; given d0 and d1 are 0 to FFFFFFFF bchg.l #31, d0 ; convert unsigned to signed -80000000 to +7FFFFFFF fmove.l d0,fp0 ; convert signed to float -2147483648.0 to 2147483647.0 fadd.l #2147483648,fp0 ; convert back into unsigned 0.0 to 4294967295.0 bchg.l #31,d1 ; convert unsigned to signed -80000000 to +7FFFFFFF fmove.l d1,fp0 ; convert signed to float -2147483648.0 to 2147483647.0 fadd.l #2147483648,fp1 ; convert back into unsigned 0.0 to 4294967295.0 fdiv.l fp1,fp0 ; perform 'unsigned' division >= 0.0 fsub.l #2147483648,fp1 ; convert unsigned to signed -2147483648.0 to 2147483647.0 fmove.l fp0,d0 ; convert signed to integer -80000000 to +7FFFFFFF bchg.l #31, d0 ; convert back into unsigned 0 to FFFFFFFF bchg.l #31, d1 ; convert back into unsigned 0 to FFFFFFFF
| |
| | Samuel Devulder
Posts 248 22 Sep 2018 12:40
| Renee Cousins wrote:
| Preliminary unsigned version (needs testing). ; given d0 and d1 are 0 to FFFFFFFF bchg.l #31, d0 ; convert unsigned to signed -80000000 to +7FFFFFFF
|
I think it is a mistake. Bchg #31 on -$80000000 doesn't create $7FFFFFFF but $00000000. Or am I missing something?
| |
| | Renee Cousins (Apollo Team Member) Posts 142 22 Sep 2018 16:16
| Samuel Devulder wrote:
|
Renee Cousins wrote:
| Preliminary unsigned version (needs testing). ; given d0 and d1 are 0 to FFFFFFFF bchg.l #31, d0 ; convert unsigned to signed -80000000 to +7FFFFFFF
|
I think it is a mistake. Bchg #31 on -$80000000 doesn't create $7FFFFFFF but $00000000. Or am I missing something?
|
That was a range, not this into that.
| |
| | Samuel Devulder
Posts 248 22 Sep 2018 18:25
| Oh I see! You remove #2147483648 with bchg (sub.l would also be possible), and then you add 2147483648.0 on the float later. This is fine indeed. Sorry for the trouble. ;)
| |
| | Renee Cousins (Apollo Team Member) Posts 142 23 Sep 2018 06:19
| Samuel Devulder wrote:
| Oh I see! You remove #2147483648 with bchg (sub.l would also be possible), and then you add 2147483648.0 on the float later. This is fine indeed. Sorry for the trouble. ;)
|
Yes sub or add are equivalent to bchg on the highest bit.
| |
|
|
|