Overview Features Coding ApolloOS Performance Forum Downloads Products Order Contact

Welcome to the Apollo Forum

This forum is for people interested in the APOLLO CPU.
Please read the forum usage manual.
Please visit our Apollo-Discord Server for support.



All TopicsNewsPerformanceGamesDemosApolloVampireAROSWorkbenchATARIReleases

BFINS Instruction Bug?page  1 2 

Adam Polkosnik

Posts 30
15 Jul 2019 22:05


I think that some of the graphical glitches are caused by buggy implementation of BFINS in tg68. It's shared between MIST, MISTer, and the Vampire Gold 3 Alpha. It manifests itself in Goraud Pulse part of Nexus7 demo
E.g. BFINS D3,($7000,A1){D4:D2}.




Andrew Copland

Posts 113
16 Jul 2019 11:42


It should be fairly easy to write some unit tests for that to prove the theory


Adam Polkosnik

Posts 30
21 Jul 2019 05:30


It turns out that it's triggered by access to non-aligned memory. E.g. BTST #0,$dff005 or BTST #0,($5,A6)


Philippe Flype
(Apollo Team Member)
Posts 299
21 Jul 2019 07:53


@Adam
 
I'm trying locate this since some days.

And indeed i didnt find any Nexus7 BF issues.

I see very well where is the Gouraud routine.

But cant catch an error.

EXTERNAL LINK 

Also i tested a bunch of unaligned BTST, also without issues.

Do you have precise stuff to show me, any more hints please ?

For information, the 080 BF set is own implementation, not tg68 based.


Adam Polkosnik

Posts 30
21 Jul 2019 13:57


@Philippe,
I've tracked it down to be an unalligned access to the custom chip memory. Basically, I kept freezing the demo at varying instruction counts from the last "restore point" narrowing it down to the spot where the display was getting corrupted. See the MiSTer issue. BFINS implementation is just fine (I've even built and flashed a minimum with a slightly altered code for BFINS to test my initial theory), it's just BTST that reads from $dff005 causes mayhem.
   
    EXTERNAL LINK   
  On my a500 V2+, I only had a WHDLOAD version of the demo, and it was giving me the corrupted torus on the Gold 3 alpha. Check out the  other AGA demo mentioned towards the end of that MiSTer issue, I didn't check that one on Gold 3, since I loaded the latest beta of 2.12. *UAE seems to have fixed the unalligned memory access long time ago a I pointed out in the issue.
 
  I'm planning on building a fix for MiSTer once I figure out tg68k some more.
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
22 Jul 2019 06:48


Adam Polkosnik wrote:

it's just BTST that reads from $dff005 causes mayhem.

There is nothing wrong with the BTST on $DFF005.
You just wild guessing.
 


Adam Polkosnik

Posts 30
22 Jul 2019 08:21


Well, yeah I'm throwing stuff around and then testing if it sticks. When I step through it in the debugger, the 7th bitplane gets corrupted. I've patched the wait vb piece to use an aligned read, and it happened too. At this point my next wild guess is either chipset issue or some weird stuff is happening in the vertical blank interrupt. Nobody managed to get it fixed for quite a while and I'm just bouncing between UAE and MiSTer, at least I have some fun with HRTMon.


Adam Polkosnik

Posts 30
17 Oct 2019 08:28


Ok, so my initial guess turned out to be pretty close.
I managed to fix it on Mister. Basically, there are some bits that sometimes are used for switching between Address registers and Data registers, but in case of some instructions, these bits are not used, because the instructions only support Data registers and so on.
 
This is how I fixed it on MiSTer.
  EXTERNAL LINK


Andrew Copland

Posts 113
22 Oct 2019 11:47


Nice work Adam!


Otto PS
(Needs Verification)
Posts 26/ 1
21 Jul 2020 00:00


Adam Polkosnik wrote:

  Ok, so my initial guess turned out to be pretty close.
  I managed to fix it on Mister. Basically, there are some bits that sometimes are used for switching between Address registers and Data registers, but in case of some instructions, these bits are not used, because the instructions only support Data registers and so on.
   
  This is how I fixed it on MiSTer.
    EXTERNAL LINK 

 
  Interesting. I really appreciate your work! Could you explain the bug in detail. I want to compile some examples and debug them on Apollo 68080.

EDIT: I've analyzed your tg68k patch. I am going to generate and test the offensive opcodes on my v1200 and winuae. Thanks!
 


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Jul 2020 05:58


Otto PS wrote:

  Interesting. I really appreciate your work! Could you explain the bug in detail. I want to compile some examples and debug them on Apollo 68080.
 

OTTO there is nothing to debug.
The demo runs fine on your Apollo-68080, Vampire4.


Otto PS
(Needs Verification)
Posts 26/ 1
21 Jul 2020 06:31


Gunnar von Boehn wrote:

Otto PS wrote:

  Interesting. I really appreciate your work! Could you explain the bug in detail. I want to compile some examples and debug them on Apollo 68080.
 

  OTTO there is nothing to debug.
  The demo runs fine on your Apollo-68080, Vampire4.

You're right! I have tested the side-by-side behavior of bit field instructions that might be affected by the bug reported by Adam. The current apollo 68080 (2.12 core) does not have the bug that tg68k apparently had.
 
Sorry for reopening this thread.


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Jul 2020 06:37


Otto PS wrote:

  You're right! I have tested the side-by-side behavior of bit field instructions that might be affected by the bug reported by Adam. The current apollo 68080 (2.12 core) does not have the bug that tg68k apparently had.
   
Sorry for reopening this thread.

 
Yes, the demo runs fine on Apollo.
 
 
Actually I would even say that TG68 had NO bug here.
The fact is that the Demo uses an illegal opcode.
So the demo is the fault not the CPU!
The demo should have never use this reserved opcode.

We have to mind that the opcode space of the 68K is not 100% saturated. And this is for a good reason.
The reason is to be able to extend the existing instructions
and to be able to add more instruction or new features.
This means the reserved "space" should have never been used be the demo.
 

You can even look at this in a way to say that the real bug was that Motorola 68k CPU did not trap and throw an exception this opcode using reserved space.
 


Philippe Flype
(Apollo Team Member)
Posts 299
21 Jul 2020 08:34


Detailed explanation of the bug :
 

  ;----------------------------------------------------------
  ;
  ; Nexus7 Gouraud Pulse Bug.
  ; Apollo-Team, flype, 2019.
  ;
  ; Description of the bug :
  ;
  ; Nexus7 demo use an invalid D/A into the Brief
  ; Effective Address Word of some BFINS opcodes.
  ;
  ; dc.w $efe9,$5922,$6fc8 ; bfins d5,($6fc8,a1){d4:d2}
  ;            ^
  ;            ^ Bit15 = 0 --> valid (D/A -> Dn is valid)
  ;
  ; dc.w $efe9,$d922,$6fc8 ; bfins d5,($6fc8,a1){d4:d2}
  ;            ^
  ;            ^ Bit15 = 1 --> invalid (D/A -> An is not valid)
  ;
  ; This wrong/buggy opcode will result, in some M68K
  ; implementations, to use register A5, instead of D5 !!!
  ; On real MC68020+, this bit is ignored / not wired.
  ;
  ;----------------------------------------------------------
 
    MACHINE MC68020
   
  Main:
    ;
    ; Some inputs
    ;
   
    lea    Data1,a1                ; Buffer
    move.l  #$ABADCAFE,a5          ; Incorrect data
    move.l  #$CAFECAFE,d5          ; Correct data
    move.l  #$00,d4                ; Position
    move.l  #$20,d2                ; Length
   
    ;
    ; Opcode '5922' -> valid D/A
    ;
   
    dc.w    $efe9,$5922,$0010      ; bfins d5,($10,a1){d4:d2}
   
    ;
    ; Opcode 'd922' -> invalid D/A
    ;
   
    dc.w    $efe9,$d922,$0010      ; bfins d5,($10,a1){d4:d2}
   
    ;
    ; Check result
    ;
   
    bfextu  ($10,a1){d4:d2},d0      ; Read value back
    cmp.l  #$CAFECAFE,d0          ; Check result
    seq    d0                      ; Store result
    and.l  #1,d0                  ; Store as bool
    RTS
   
    CNOP    0,4
  Data1:
    DS.B    128
   
    END
   

 
 
  So yes it s a bug of misuse of BFINS Opcode in the demo AND no trap'ing from Motorola. And maybe the assembler used by the demomakers. 


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Jul 2020 09:39


Maybe it makes sense to clarify this again.
 
A CPU instruction set defines which instruction is which opcode.
E.g. $7000 = MOVEQ #0,D0
 
Many people say that Motorola instruction set is very beautiful and very cleanly defined.
The general opinion is that Motorola did a really good job here.
 
The 68K family is a range of CPUs supporting the same instruction set.
And the 68k instruction set was from beginning defined with expansion in mind.
 
The 68000 did had many instructions and the rest range was defined for "expansion".
The 68010 then used this expansion space to add new instructions.
The 68020 again used this reserved space to add new instruction
and also used the reserved space to add more address modes.
And this continued with every CPU generation.
Reserved space was there to add new features.
So all this was cleanly defined and cleanly used.

 
This demo is violating this Motorola rule.
The demo illegally uses a reserved instruction space - which Motorola did reserve to be able to add more instruction more feature in future.
 
If you would ask Motorola then its clear what they would say:
The "correct" and clean solution here would be to keep this reserved space open and to fix the demo.


Philippe Flype
(Apollo Team Member)
Posts 299
21 Jul 2020 09:48


Well yes but what really matter, for compatibility purpose, is that any 68k implementation have to behave same way than the legacy one, including the on-the-edge behaviours. Here, it means the implementation must have to ignore the unused bit(s) in the EA decoder so that a An still is decoded as a Dn opcode. In other words, the EA decoder spec is contextual to the instruction.

However, a next gen 68k may want to extend that EA decoding to support new modes. If doing that, here it would break programs who does undocumented stuff. Dilemna. But oh well, thing is it was fun to precisely locate that bug.


Gunnar von Boehn
(Apollo Team Member)
Posts 6197
21 Jul 2020 09:56


Philippe Flype wrote:

  Well yes but what really matter, for compatibility purpose, is that any 68k implementation have to behave same way than the legacy one, including the on-the-edge behaviours. Here, it means the implementation must have to ignore the unused bit(s) in the EA decoder so that a An still is decoded as a Dn opcode.
 

 
But then Motorola would have NEVER been able to develop the 68020.
 
As you know the 68000 did had some reserved bits in the EA mode.
And the rule was that every program had to write them as "0" - this was clearly the rule Motorola gave.
And please mind that the 68000 did NOT check this rule.
This meant you could in theory write whatever trash there - the 68000 did simple ignore all the bits.
 
As you know the 68020 started to use some of these extra bits.
This means programs writing trash in the reserved bits would run on the 68000 -- it would not bother.
But the programs do ALL break and crash on 68020 and later CPUs.
 
Now what we describe is exactly what happened here in this demo.
The demo illegally ignored and violates the Motorola coding rule.
This demo is bad.
It will crash as soon any CPU uses this reserved bit.
 
The only correct solution is fixing the demo.

-

Mind that there is "no edge" case here.
The Motorola coding rule is very clear - this bit needs be "0".
If you not set this bit to "0" anything might happen.
You program might run, might crash or might execute a different instruction. This bit is reserved for expansion - do never write anything then "0" now. Future CPUs might use it.

This coding rule was crystal clear.
No CPU bug here at all.


Philippe Flype
(Apollo Team Member)
Posts 299
21 Jul 2020 10:06


To me it looks like a broken assembler. Whatever, It's easy to hexaedit the demo (if data were not compressed / unpacked realtime) or to fix through whdload/slave patch.


Otto PS
(Needs Verification)
Posts 26/ 1
21 Jul 2020 19:54


Clearly I made a mistake, I was not setting the correct bit (15) for BFINS, so the opcodes were always correct :(. I will do the corrected tests again but we already know the result.
 
  But i have two conceptual questions:
 
  1) If the tg68k (as an example) is determined "as 68020 compatible" and does not work in practice as a 68020 with respect to the BFINS instruction. Is it fully compatible or not?
 
 
  2) Sometimes the CPUs have bugs and those bugs are kept in future revisions or evolutions of the same CPU (68020, 68030, 68040 as an example) when a new CPU appears that does not maintain the behavior of certain instructions (example BFINS). Is that CPU fully backward compatible? Is that CPU partially backward compatible? Is that CPU backward incompatible?

EDIT: Anyway I think it is good to identify these behaviors to be able to make binary patches and whdload slaves that are fully compatible with AC 68080.
 


Philippe Flype
(Apollo Team Member)
Posts 299
21 Jul 2020 21:37


The 080 already integrates that specific behaviour. The gouraud fx works on vamp since a year. That Bit15 (Dn/An selector) is ignored like on the real 020/030.

posts 37page  1 2