Forum (#33837) - Plus/4 World



	Home Search Games Tapes Covers Cheats Maps Software New Stuff Hall Of Fame HVTC Game Endings Solutions Remakes Publications Magazines Effects Top List Members Groups Features Upload Plus/4 Encyclopedia Hardware Tools Options Forum Home Search Games Tapes Covers Cheats Maps Software New Stuff Hall Of Fame HVTC Game Endings Solutions Remakes Publications Magazines Effects Top List Members Groups Features Upload Plus/4 Encyclopedia Hardware Tools Options Forum	Login

Back to forum

See the full topic

Go to last reply

Posted By

JamesD
on 2017-02-18
07:36:57

Re: A mathematical demo

What is appropriate about discussing the 6809, is that the tread is about benchmarking the 6502 vs the other processors. Your 65816 code is certainly faster than that 6809 code... but that 6809 code misses simple optimization that is commonly discussed.

Your notes about Y register are not correct. It is always used in a context where it is so fast as X in the main loop between labels 'loop2' and 'l4'.

One additional clock cycle for the extra byte, look it up.
2 byte opcodes extracted from the table at the following address.
http://techheap.packetizer.com/processors/6809/6809Instructions.html

Notice that X is not mentioned in the 2 byte opcodes.
What really sucks, is that CMPD is also a 2 byte opcode.
I would have traded 3 of those with SWI, SYNC, and TST since those aren't used as often.

 +-----------------------------------------------------------------+
 |                       Page 1 Instructions^                      |
 +------------+-------------+--------------+---------------+-------+
 | Opcode     |             | Addressing   |               |       |
 | Hex   Dec  | Instruction | Mode         | Cycles  Bytes | HNZVC |
 +------------+-------------+--------------+-------+-------+-------+
...
 | 1083  4227 | CMPD        | IMMEDIATE    |   5   |   4   | -aaaa |
 | 108C  4236 | CMPY        | IMMEDIATE    |   5   |   4   | -aaaa |
 | 108E  4238 | LDY         | IMMEDIATE    |   4   |   4   | -aa0- |
 | 1093  4243 | CMPD        | DIRECT       |   7   |   3   | -aaaa |
 | 109C  4252 | CMPY        | DIRECT       |   7   |   3   | -aaaa |
 | 109E  4254 | LDY         | DIRECT       |   6   |   3   | -aa0- |
 | 109F  4255 | STY         | DIRECT       |   6   |   3   | -aa0- |
 | 10A3  4259 | CMPD        | INDEXED      |   7   |   3   | -aaaa |
 | 10AC  4268 | CMPY        | INDEXED      |   7   |   3   | -aaaa |
 | 10AE  4270 | LDY         | INDEXED      |   6   |   3   | -aa0- |
 | 10AF  4271 | STY         | INDEXED      |   6   |   3   | -aa0- |
 | 10B3  4275 | CMPD        | EXTENDED     |   8   |   4   | -aaaa |
 | 10BC  4284 | CMPY        | EXTENDED     |   8   |   4   | -aaaa |
 | 10BE  4286 | LDY         | EXTENDED     |   7   |   4   | -aa0- |
 | 10BF  4287 | STY         | EXTENDED     |   7   |   4   | -aa0- |
...
 +------------+-------------+--------------+-------+-------+-------+

The whole self refreshing thing is a bit quirky. As long as you access a block of RAM often enough, it gets refreshed. Any block you don't access often enough goes bye bye. What size that block is depends on the DRAM chips used. So, the benchmark might keep running, but the long interval between screen writes as primes get further apart might cause the screen RAM to take a dump.
Now, you might refresh the screen RAM by writing to a byte at regular intervals... but then that would be reflected in your benchmark.

Instead of arguing with me, just try running it on a real Dragon. If it works, great, if it doesn't... bummer, it's not the end of the world.

The thing about the 4510, 6509, 65816, etc... is that they don't speed up existing code like the 6309 and 64180. The 6309 and 64180 give you a 20-30% speed bump on *existing* code, which is what I meant but I didn't say it, so point taken.

I finally found the rest of the 6509 info btw. It was missing something I thought it was going to have, so... not quite as nice as I expected.
If you extract the best additions from the 4510, 6509, and 65816, you would have a pretty decent chip for sure. Relocatable code, movable direct page, multiply, stack relative addressing, 16 bit support, access to more than 64K (even though I'm not thrilled with how the 65816 does it), memory move instructions...
It would certainly support high level languages better. That and CPU addressable memory are probably the biggest advantage 16 bit machines had over the 8 bit machines. If you design a 6502 upgrade where you can just compile your program and it works with more than 64K with no effort, that is what the 6502 needed. The 65816 can certainly do that... but all the paging and mode switching makes it a bit uglier than it should have been.

BTW, the 65816 memory move instructions suck! 7 clock cycles per byte vs 3 for the 6309 equivalent. Even with the 24 bit addressing it should have only been 4 clock cycles at most. I think even 5 would have matched the unrolled loop I use to scroll the screen in my code. That is definitely something that could have been sped up.