Login
Back to forumReply to this topicGo to last reply

Posted By

Litwr
on 2016-01-01
09:57:01
 A mathematical demo

http://www.lemon64.com/forum/viewtopic.php?p=709114#709114
The time of the first 100 digits of the number π calculation.
Commodore 64/PAL - 4.03 secs
Amstrad CPC6128 - 2.65 secs
Commodore 128/PAL - 2.2 secs
Commodore +4/PAL - 1.92 secs
Plus/4 is the best! happy I wish a very happy and 8-bit-tastic New 2016 Year to everybody! happy

Posted By

SVS
on 2016-01-01
12:16:49
 Re: A mathematical demo

Long life to Plus/4!
\\//
'| |

Posted By

JamesC
on 2016-01-01
12:41:35
 Re: A mathematical demo

YAPE 1.0.6 in NTSC mode (100% speed, 60 fps, high chip compatibility) - 2.41 seconds.
But if I edit line 120 to IF NTSC=2 - 1.611 seconds.

In theory, these should be approximately the same result, since YAPE thinks it's in NTSC mode either way. The only difference I see is line 140, a poke to 4735, but I have not looked at the ML to see what is being altered.

Setting YAPE to Fast Mode, still NTSC, and keeping the line 120 edit - 0.8 seconds.



Posted By

MMS
on 2016-01-01
13:54:51
 Re: A mathematical demo

Nice result! It may give a -certain- answer on the Z80 VS CBM speeds.
Although it surprises me a little, but the problem of the bad lines and RAM/ROM switching seems to closed out in this one (this seems really generates a major slowdown on the Plus/4 and C128. I do not know what could be a real solution for that in a 8bit machine. Even SAM Coupe ( a very very advanced 8bit Z80 machine introduced in 1989 (!) suffered major slowdown due to shared memory with GPU and bad lines (wait states) https://en.wikipedia.org/wiki/SAM_Coupé#ZX_Spectrum_compatibility )

Question, if they could do the same improvementson the CPC too to be fair?

Posted By

Litwr
on 2016-01-01
14:27:22
 Re: A mathematical demo

+4/NTSC is a bit faster with the screen off than +4/PAL. I can't see point to edit line 120 - it just prevents to set NTSC mode for the NTSC hardware. happy There is also no reason to set slower PAL mode for the NTSC hardware. The change of mode also requires the adequate change for the timer calculation. So the only result 2.41 sec (+4/NTSC with screen blank) is reliable. This result is a bit surprising. +4/PAL with screen blank and no NTSC gives 2.4 - this is a bit faster. IMHO this result is caused by more frequent NTSC video interrupts that consume more CPU ticks than PAL.
Amstrad CPC is slow down by the video system to the actual 3.2 MHz instead of 4 MHz of the system quartz. Unlike Commodore Amstrad can' disable its video system. BTW my code for z80 is not the fastest. I am testing the program which is at least 20% faster. So it looks like that 6502 is only 60-80% faster than z80 for 16-bit arithmetic.

Posted By

JamesC
on 2016-01-01
15:38:24
 Re: A mathematical demo

@Litwr - I changed line 120 because I could not verify 1.92 seconds using YAPE in NTSC. The best I could do was 2.41, considerably slower for a supposedly faster machine!

I did not change YAPE to PAL mode. I only changed line 120 so that I would be prompted for the NTSC speedup, so that I could see if it made a difference. Is there no reliable way to use a system clock or timer when the screen is blanked, so that you don't have to code your own timer?

Posted By

Litwr
on 2016-01-08
11:36:07
 Re: A mathematical demo

Sorry I can't understand your point. Don't you believe in 1.92 sec with Plus4/PAL? We may ask Luca to check it. I can only repeat that the NTSC setting for Plus4/NTSC is senseless - so as the editing the line 120. Plus4/NTSC can't reach 1.92 secs, it has hardware limit to 2.41 secs. Maybe 1-2% better with no interrupts. Plus4/NTSC is not faster it is slower - unlike c64/NTSC which is faster than C64/PAL. Plus4/NTSC CPU is 1% faster than Plus4/PAL only in the screen blank mode but more frequent raster interrupts make Plus4/NTSC slower even in this case. BTW Plus4/NTSC is 10% slower than Plus4/PAL with the screen on.
[The news]
I upgraded and gave version 1 to this demo. The results for 1000 π-digits
Commodore 64/PAL - 331 s,
Commodore 64/NTSC - 319 s,
Commodore 128/PAL - 175.5 s,
Commodore +4/PAL - 152.3 s,
Amstrad CPC6128 - 179.3 s,
IBM PC 8150 (1981, CGA) - 76.6 s.
Commodore +4 version may calculate up to 7680 digits, Commodore 64 - 6320, Commodore 128 - 5008, Amstrad CPC6128 - 5536, IBM PC 8150 - 9000. These number may be increased for Commodore 64/128 but this requires a bit tricky memory management. Amstrad CPC version for CP/M may reach up to 8600 digits.

Posted By

MMS
on 2016-01-08
21:18:12
 Re: A mathematical demo

IBM PC (if remember well, at 4.77MHz) was only 2times faster than Plus/4? Shocking...
(but I suppose PC type is 5150, not 8150)

Posted By

Litwr
on 2016-01-09
13:45:06
 Re: A mathematical demo

Sorry for the typo. This is 5150.
It looks like that plussiers like to underestimate the power of Plus/4. See http://www.cpcwiki.eu/forum/demos/a-mathematical-demo/msg117035/#msg117035 - Amstrad's ppl on the contrary see their computer a bit overestimated. happy
This mathematical demo shows a bit surprising result about C128 - it has the effective frequency about 1.9 MHz, not 2 as written in Wikipedia.
The version of this demo for IBM PC (http://litwr2.atspace.eu/ibm-5150/ibm-5150.html) is not so well optimized as the versions for 6502 or z80. I am sure that it may be accelerated by 50%. However my tests with Xlife-8 show that the main advantage of 8088 is the hardware division. The number π calculation relies mostly on the division. The demo for 6502 and z80 uses an unpublished and very fast 32 bit by 16 bit division routine but it is 3-5 times slower than the ML div-instruction of 8088.
The mentioned tests also show that 8088 without division at 4.77 MHz is only about 20% faster than 6502 at 2 MHz . US market was closed for the fastest 6502 computers, CBM II, BBC Micro... There were 6502 @3MHz even at 1981... There was also a problem of slow RAM. IMHO the part of the power of 8088 was consumed by this slow RAM.
[UPDATE]
I've just optimized IBM PC version. So it is become "official" too. It became almost twice faster and gives 1000 digits for 40.4 seconds. Is it less shocking? :-) This shows that 8088 at 4.77 MHz is approximately 2 times faster than 6502 at 2 MHz and the often usage of the division lifts this ratio up to 4.
I also made a version for RT-11 OS for PDP-11. It is still not optimized. It gives 1000 digits during 21.8 seconds at MicroPDP-11/83 (CPU at 18 MHz).


Posted By

MMS
on 2016-01-09
18:13:04
 Re: A mathematical demo

Many thanks for the efforts!
Yes, it is less shocking. happy As it make better understanding, why IBM became so dominant with PC.
As if it would be only 2 times faster "only" VS a "slow" Commodore, and weak CGA+sound why would anyone anyone call it professional?
But thinking a little: I really missed the years happy

IBM PC 5150 was in 1981, while C+4 released in 1984.
In 1984 IBM already had IBM AT (I also bought in 1992) with 286 processor >1MB memory and EGA.
So in no way the two can be compared (+4 VS AT)

Posted By

Litwr
on 2016-01-11
12:26:19
 Re: A mathematical demo

Thanks for your evaluation. 8088 is very slow in the many ways: the JMP takes 16 cycles - it takes only 3 at 6502 (12 at z80). 8088 requires 4-6 cycles to get a byte from the memory, 6502 - only 1 (less than 4 at z80), .... So if task doesn't fit into the registers then 8088 at PC XT is only about 20% faster than 6502 @2MHz.
IBM PC AT and 80286 eliminated the speed superiority of 6502 but at 1984 only... 6502 was not upgraded since 1976... CBM II and BBC Micro were available at 1982 and might be the rivals to PC XT... It is even possible to think that a bit aggressive negative advertisement of Plus/4 was a step in the mentioned way.

Posted By

MMS
on 2016-01-12
01:13:26
 Re: A mathematical demo

-65C02 with much higher speeds? almost the same, but up to 7MHz
- 65C816 the 16bit version (1983), that is planned to be used in C65 and used in SuperCPU. It could be a beast happy

Posted By

gerliczer
on 2016-01-12
04:50:47
 Re: A mathematical demo

OFF TOPIC

The 4510 (Victor) CPU of the C65 is not a 65816 derivative. It is based on 65CE02.

/OFF TOPIC

Posted By

MMS
on 2016-01-12
14:36:14
 Re: A mathematical demo

Hi,thanks for the clarification! The size of addressable memory mislead me. 😊

Posted By

Litwr
on 2016-01-17
03:01:29
 Re: A mathematical demo

The Pi-demo helped to find a bug in the plus4emu emulator. This emulator works 20% faster in the NTSC model emulation mode than it should be. This bug can only be detected by using the external timer. Is there are a way to inform IstvanV about this bug?
BTW yape works right in the mentioned case.

Posted By

gerliczer
on 2016-01-17
05:35:02
 Re: A mathematical demo

The member page of IstvanV has an e-mail address. Did you try it?

Posted By

Csabo
on 2016-01-18
13:57:42
 Re: A mathematical demo

I've just added a new table created by Gaia into the KB:
http://plus4world.powweb.com/plus4encyclopedia/500248

Posted By

MMS
on 2016-01-19
00:53:29
 Re: A mathematical demo

Ok, last question: what are the defaults of thr PAL and NTSC systems at startup? If those are the first PAL/PAL and NTSC/NTSC,then sorry for the point.

Posted By

Gaia
on 2016-01-19
02:42:17
 Re: A mathematical demo

They are the first PAL/PAL and NTSC/NTSC pairs ("open" borders and slow bit off).

Posted By

Litwr
on 2016-01-20
06:20:37
 Re: A mathematical demo

Gaia and Csabo, thanks! I also wanted to make such a table sometimes. IMHO C128 and even C64 still miss such thing...
gerliczer, I used this email several times - no replies.

Posted By

gerliczer
on 2016-01-20
09:01:55
 Re: A mathematical demo

Litwr,

Did you change the clock frequency too? IIRC, it should be in the machine configuration window. The default frequency is usually 17734475. This should be replaced with 14318180 for NTSC emulation.

Edit: You also need an NTSC KERNAL ROM image for proper operation.

Posted By

Litwr
on 2016-01-20
10:59:57
 Re: A mathematical demo

Wow! It requires to enter 8 digits manually! So it is not the emulator bug but something close to a bug in the interface. Yape requires just one click.

Posted By

gerliczer
on 2016-01-20
11:14:41
 Re: A mathematical demo

OK, OK. Did it help or not?

Posted By

RobertB
on 2016-01-21
01:17:27
 Re: A mathematical demo

Is the C128 running in SLOW 1 MHz. mode, or FAST 2 MHz. mode for this test?

Truly,
Robert Bernardo
Fresno Commodore User Group
http://www.dickestel.com/fcug.htm

Posted By

gerliczer
on 2016-01-21
03:01:06
 Re: A mathematical demo

@RobertB

"Commodore 64/PAL - 331 s,
Commodore 128/PAL - 175.5 s"

What do you think?

Posted By

RobertB
on 2016-01-21
23:06:50
 Re: A mathematical demo

I think it needs my SuperCPU 128. wink

Truly,
Robert Bernardo
Fresno Commodore User Group
http://www.dickestel.com/fcug.htm

Posted By

Litwr
on 2016-01-23
09:08:54
 Re: A mathematical demo

SuperCPU 128 is faster than PC AT... Did you try Pi with it?
Dear gerliczer, I couldn't get idea about manual input of the big number 14318180 without your help. Thank you very much.




Posted By

gerliczer
on 2016-01-23
05:48:19
 Re: A mathematical demo

You're welcome anytime Litwr. It took me a while too to figure out what to do with plus4emu to switch it over in NTSC mode. I guess, it is properly documented in the help or readme file, but sometimes I tend to see reading them as a last resort.

I think, it is not a user interface design error, but an effort to provide maximum configurability. Having TV encoding system, system clock frequency and KERNAL ROM being configurable independently means, or at least that's what I think, that it is possible to emulate Drean systems too. (Those were sold in South-America, and comply with a strange mix of PAL and NTSC TV standards.) YAPE does not have this option.

Posted By

Litwr
on 2016-01-23
09:46:52
 Re: A mathematical demo

plus4emu has very flexible configuration system. It has a lot of ready hardware configuration files. Such file maybe loaded very fast by two mouse clicks. Unfortunately my build doesn't support ASCII configuration load. With your help I made several binary configuration files and can switch to PAL or NTSC very easy. happy BTW It is not easy to build plus4emu with all features installed. C libraries are changing permanently...
I also got a bit more data for 1000 digits π-calculation.
Plus4/NTSC - 188.4 s
C128/NTSC - 168.5 s
SuperCPU64/NTSC - 16.3 s
Results for C128 and SuperCPU64 are gotten from Vice emulators. So this result for SuperCPU is 2.5 times better than PC XT. It is sad that there is no PC AT emulator but I can estimate that PC AT @6Mhz gives about 25 sec for this test.
It is also curious: are there any software for or from South America?

Posted By

gerliczer
on 2016-01-23
10:06:43
 Re: A mathematical demo

You may want to try http://pcem-emulator.co.uk/ for PC emulation. It is for Windows, so it may not help you if you are in Linux or *BSD world.

Posted By

Litwr
on 2016-03-06
09:44:50
 Re: A mathematical demo

Thanks again. I missed this emulator. It has Linux port and I use it. The test gives a bit surprising result - only 8.2 secs for 1000 digits - it is faster than SuperCPU. The explanation is easy - 80286 division is 7 times faster that at 8088. However I am not sure about exactness of this emulator. IMHO it is faster than iron AT.
[UPDATE]
Thank to RobertB :) I've made a SuperCPU version of this demo. It works in the native 16 bit mode. It looks like the native mode program maybe upto 100% shorter and upto 50% faster than the program for the emulated (C64) mode. My aim was to write a code which will be faster than for IBM PC AT 6 MHz. I couldn't reach it. :( 80286 has too fast hardware division.
link to page with the latest results
BTW SuperCPU shows itself as about 21 times faster C64. It is the amazing piece of hardware! If c264 line was a bit happier then such things would be made for +4 too...

Posted By

MMS
on 2016-05-04
18:27:57
 Re: A mathematical demo

Once a smarter guy told, that the Expansion port missing the NMI line, that is required for an external processor to take full control. I also could not find it on the drawing (just 3 NotConnected lines), so maybe right...

Posted By

gerliczer
on 2016-03-07
02:51:52
 Re: A mathematical demo


@MMS

I'm a bit sceptic about missing an NMI pin on the expansion port is the only obstacle. 1) Where would you connect it in a 264'er machine? 2) How would triggering an interrupt, even if it is Non-Maskable Interrupt, help taking over the system? And we didn't even mention that NMI in C64 is edge-triggered. 3) Wouldn't it be much more reasonable if we could control externally the BA (RDY) and AEC signals? That would make not too hard implementing a freezer+turbo card, or at least that's what I think. Maybe it could be done similarly to the C64 with a DMA pin on the expansion port, or with some insane trickery on those two existing signal pins.


Posted By

Litwr
on 2016-05-04
11:41:01
 Re: A mathematical demo

It is updated to v7 - http://litwr2.atspace.eu/pi/pi-spigot-benchmark.html. The C+4 version is 4 now - it is slightly faster than previous and makes the C+4 leadership in speed for 8 bit PC world more firm. wink There is also v2 for Super CPU which became faster that IBM PC AT's! happy The versions for other Commodores are also updated.

Posted By

MMS
on 2016-05-21
06:34:45
 Re: A mathematical demo

OFF
Hi gerliczer,
I do not know the very details. It is back to the day when we discussed an possible external CPU, at that time I liked the idea of the eZ80, and a ready to run modern GUI called SymbOS.
It was told, that on C64 the NMI line was necessary to able to run a Z80 based CP/M card on C64.

As I do not know, what ELSE is required, I just stopped at this point, that if there is no NMI line (while the CPU is very much the same as 6502), then there is no chance to create such a thing. Though... the fantastic video shown last AROK is using some smart data tranfer and other magics we did not know beforehand...

ON
It is just GREAT to see what a speed demon we ride:-D

Posted By

Litwr
on 2016-05-05
12:25:11
 Re: A mathematical demo

Hi, MMS! Thank you! happy Maybe SAM Coupé or Atari Lynx are faster than our little demon but Lynx is not a PC and SAM Coupé is too rare to be taken into the account. C65 is also 8-bit...
It was mentioned above that z80 systems have major slowdown by video system... So I have to say several bitter and emotional words. The better Z80 based systems have 10-20% slowdown but ALL 6502 based systems have 50-75% slowdown! I can understand this for the first PC like Apple ][ or Commodore PET, but how is it possible to make C128 for 1985 with RAM at 1.9 MHz only even at VDC and RGBI monitor mode? BBC Micro uses RAM at 4 MHz at 1981. So the proper design should give up to 5 MHz CPU for C128. What was it? The global conspiracy against 6502 or total stupidity? Another question. Why CPU was fixed by VIC-II? It looks so easy to make the special RAM for it like for VDC of C128. I want to find opportunity to ask Bil Herd about it.
[z80 at Plus4] IMHO it is possible. There were a lot of sayings that C64 accelerator is impossible due to VIC-II but superb SuperCPU was made...

Posted By

MMS
on 2016-05-21
07:26:02
 Re: A mathematical demo

Hi Litwr,

As I know the SAM Coupe suffered the highest slowdown ever due to video refresh and shared RAM. (I remember how much it was waited by the Speccy fans to be the 8bit Amiga...)
Despite 6MHz CPU it was only few percent faster than a 3.5MHz Zx Spectrum.

Posted By

MIRKOSOFT
on 2016-05-21
10:44:57
 Re: A mathematical demo

Hi!
I'm on Plussy scene new, but I'm real 128er.
You're writting about extra CPU and Z80.
So, I'm owner of SCPU128 (one extra CPU) and CP/M cartridge (one extra CPU).
First to extra CPU:
SCPU128 at power on takes control over all and makes Z80 in C128 not accessible. So, I bought CP/M cartridge to give Z80 back to my C128 also when SCPU is active. I tested and result was surprise: Z80 in cartridge works even when is SCPU active. When anybody has only CP/M cartridge, no SCPU, it gives to C128 one 8502 and two Z80 CPUs (one in C128 and one in cartridge). It is possible also to activate external Z80 from internal Z80 directly (vice-versa not), but handling this was not yet solved. I mean NMI can be problem for Plussy, but out of theme.
Downclocking Z80 - Z80 in C128 is downclocked from 4 to 2MHz, in cartridge from 8 to 4, but effective 3MHz - reason - simple: VIC-IIe chip! VIC-IIe cannot handle also 8502's 2MHz - so stupid C64 compatibility makes C128 less powerful, but it was decision of Commodore marketing which wanted not give to C128 same fate like 264 Series. In my eyes - Plussy is great, C64 is less (please forgive me C64 fans, but it's only my opinion).
I have also one Q:
Where I can get Plussy's Expansion Port schematics, datasheets, resources and if is using very frequently?
Reason is for now secret.
Thank you.
Miro

Posted By

gerliczer
on 2016-05-21
11:36:24
 Re: A mathematical demo

zimmers.net

Posted By

MMS
on 2016-05-21
16:26:31
 Re: A mathematical demo

Zimmers is good.

Some others (with strange navigation):
http://www.hardwarebook.info/C16/Plus4_Expansion_Bus

My friend BSZGG's (Hungarian) page with all the connectors and links to CPU, ACIA pin layout, etc.
You may need Google translate happy
http://bencsikszilveszter.hu/plus4/plus4/plus4.htm

On the left size choose "Bekötesek" at the middle happy
Below the "Kapcsolási Rajzok" is the scanned diagrams of C16


HINT HINT HINT
just secretly: please please prefer the eZ80, it is 4x faster than the same frequency Z80, there are 20Mhz and 50Mhz versions
(the later is equal to a 200MHz Z80 or 8088; that is the category of an Amiga 1200 )

Posted By

MIRKOSOFT
on 2016-05-21
18:59:51
 Re: A mathematical demo

To that secret: I'm sorry, secret isn't in CPU. No more for now.
Miro

Posted By

MMS
on 2016-05-21
19:00:37
 Re: A mathematical demo

OK happy

Posted By

Litwr
on 2016-05-22
13:04:55
 Re: A mathematical demo

The history of 8-bit computers makes me crazy. IMHO It is mostly of passions, politics, marketing, etc. Technology was considered in last turn. How to answer the next questions and to keep reason?
1. How was it possible that slow and ugly z80 was so widely used? However Bill Gates showed his personal favor in z80...
2. How was it possible that the fastest 8-bit PC was made at 1981 (BBC Micro) and all later computers were slower and slower? C128 in VIC mode is even slower than C64! BBC Micro video steals 50% of CPU power. So C128 with VDC might be 2-3 times faster with the same technological level. Was it the result of IBM PC and the market protection measures? BTW 6502 at 4 MHz is faster than 8088 at 5 MHz.
3. Why CP/M for 6502 was not made? It is very easy to convert 8080 codes to 6502... The size of code is small...
4. Why Commodore did not try to make faster their expensive disk drives?
5. Why VIC-II was not upgraded?
MIRKOSOFT, thanks for information about two z80 CPUs at C128. IMHO it was much better to have C64 for VIC and SID software and Amstrad CPC or PCW for CP/M and text processing for almost the same price.

Posted By

MIRKOSOFT
on 2016-05-22
13:28:20
 Re: A mathematical demo

Here it needs bit correction.
CP/M for 6502 alternative exists. Is called DOS/65. It's not whole working CP/M, but better to say alternative, also cannot run CP/M80 apps.
CP/M exists on these CPU families: x80 (Z80 etc.), x86, 68k and alternative written above.
C128 with CP/M cartridge can run these:
CP/M80 v3.0 in C128 mode
CP/M80 v2.2 in C64 mode
'CP/M65' (true name DOS/65) in C64 mode - sadly it's 6502 version, but I never found other than C64 version.
Me personally working with CP/M80, CP/M86 and DOS/65.
Why was widely used Z80? Powerful less memory costs more effective code. 8080 lost fight with 8086 and it gives its fate...
VIC-II is too, too wrong selection for C128 - ColorRAM has C128 twice, but this part of RAM works only in low-nibble, so attributes like flash, underline, reverse, alternate are not available - color only. In gfx mode can use only foreground color, background uses VIC-II paper $D021...
But C128 has VIC-IIe only for C64 compatibility and 'e' means extended to extra keyboard keys, 2MHz mode switch and Real Interlace 320/160x400...
Miro

Posted By

Litwr
on 2016-05-22
15:41:19
 Re: A mathematical demo

>But C128 has VIC-IIe only for C64 compatibility
I can't agree. The main C128 mode (120KB Basic) is VIC based. Even CP/M don't disable VIC and this makes it 10% slower. The absence of proper support for VDC is one of the oddity around C128.
8080 was replaced by 8085 which was produced by Intel even in 90s and is produced somewhere even today. IMHO 8085 is better than z80 but politic, marketing, ... hid it from PCs. Japanese made R800 with the same speed as ez80 but much earlier...
[CP/M] I wrote that there was no official 6502 CP/M. DOS/65 is interesting but too late amateur project.

Posted By

MMS
on 2016-05-22
16:02:55
 Re: A mathematical demo

About interesting 6502 computers: do you know, that at Acorn created a SLOW computer on the same base as BBC Micro, mentioed, as one of the fastest one? They wanted to compete with cheap ZX Spectrum, like the TED series from Commodore.

The used 4 bit wide memory access (!) and very badly shared memory, made the CPU some cases run at only 1/6 of the speed.
It was Acorn Electron. https://en.wikipedia.org/wiki/Acorn_Electron

It had fantastic video modes and a huge 24KB display memory (but only 8 +8 flashy colors), and only 32KB full memory.
There are the video modes, next to the 80x25 and 80x32 text modes (what a greatness for word editors; but there was not enough and just slow RAM):
160×256 (4 or 16 colours), 320×256 (2 or 4 colours), 640×256 (2 colours): the later would be fantastic for a GUI.

But the machine was slow, and low on memory, but still ~1400 games were developped.
Could be interesting, how it may perform on the test?

Posted By

MIRKOSOFT
on 2016-05-22
16:33:12
 Re: A mathematical demo

C128 Basic 7.0 is VIC based is true, but it means not that VIC-IIe is there for Basic.
In all ways VDC was always aside not for possibilities, VideoRAM dedicated to VDC in C128 or C128D was 16K only (80x25 text, but 640x200 monochrome only - VDC can more but 16K is 16K).
RGBI digital isn't possible to connect to TV, but VDC8563 has also composite monochrome output.
VDC8568 in C128DCR has 64K VideoRAM, no composite output, supporting EGA mode and from 320x200x16 up to 800x600x2, last tested was 1024x296x2...
If could first C128 have 64K VideoRAM, Basic 7.0 could support it.
CP/M has utility to turn off VIC and can work on VDC only.
Miro

Posted By

MMS
on 2016-06-01
10:11:04
 Re: A mathematical demo

Partially OFF:
Just a theoretical question: as discussed the BBC Micro (due to it's 4MHz RAM) is one of the fastest 8bit home computers back to 1981 and since then.

I know it is very complicated, but is there any way to double the speed of the Plus/4 memory to reduce the time of bad lines and let CPU go with the speed like on the border?
Or is there any fast SRAM type fit in (maybe externally), if placed / mapped into the place of the "normal" DRAM, need no RAS/CAS?

Maybe using faster DRAMsS, and hack the memroy controllerm, would it make any change?
Or it will not be any faster due to the any fixed waitstates, or just kill the precise timing of TED?

Posted By

Litwr
on 2016-06-01
15:19:35
 Re: A mathematical demo

If BBC Micro would use Plus/4 double frequency idea then it would be 50% faster. C128 may use double frequency at the vertical borders by raster interrupts and this gives average 1.3 MHz for CPU in the VIC mode.
IMHO the best way to speed up Plus/4 is to quadruple CPU clock at the borders...

Posted By

MMS
on 2016-06-01
17:25:03
 Re: A mathematical demo

Hm, really good idea, as it will not mess with the picture content. Brilliant. How we can do that? grin

Posted By

Litwr
on 2016-07-10
10:43:27
 Re: A mathematical demo

It is updated for BBC Micro data. C+4 is faster than the basic BBC Micro or even Master. But BBC Micro is like the heaven for programmers. It provides the easiest way to add the 2nd parallel processor (6502, 65016, z80, 80186, 32016, 6809, ...). So BBC Micro with the 2nd 6502 at 3 MHz (1984) is faster than C+4.
http://litwr2.atspace.eu/pi/pi-spigot-benchmark.html

Posted By

MMS
on 2016-07-10
15:19:48
 Re: A mathematical demo

Hi,

I just noticed, that the ZX Spectrum is not covered, although Amstrad is pretty close (but had special ICs for video, sound). BBC micro due to different memory management should be significantly faster than Speecy. It could be interesting to know, how much faster a converted Speccy program would be faster on this machine.

Posted By

Litwr
on 2016-09-01
06:42:36
 Re: A mathematical demo

Thanks for support! happy Help from the beeb's enthusiasts inspired me to make the more deep optimization of 6502 code. I'd even wrote a branch optimizer which minimizes the number of page-crossing branches. So PIPACK-12 is just released. It is the kind of obsession...
I know very little about Speccy. However this knowledge contains information that in some cases Spectrum maybe faster than Amstrad... If my obsession continues to be then the scientific answer maybe made one day. happy
EDIT. BBC Micro uses the ordinary way to work with RAM/ROM. It sacrifices 50% of cycles to video. C+4 is more advanced, it loses only about 25% of cycles for video - that is close to the best z80 system.
EDIT. PIPACK-15 is just released. It contains valuable data from BBC Micro world. I could even gather data for a very rare and expensive (about $8000!) first ARM system of 1986. It also contains data about the world fastest 8-bit systems 6502 at 4MHz (1986) and Z80 at 6 MHz (1984). They are the 2nd processors for BBC Micro. All data are acquired from real iron.
It is curious that Z80 at 6 MHz outperforms 6502 at 3 MHz. It is because Z80 has very good integer division. JamesC noted this fact.
It is also curious that Commodore 128 at z80 mode is almost 4 times slower than BBC Micro with Z80. This gives the effective frequency of z80 in C128 close only to 1.6 MHz...
The only disadvantage of Acorn systems was their price.

Posted By

MMS
on 2016-10-09
15:35:50
 Re: A mathematical demo

OFF
Yeah, Acorns were fantastic series of computers with crazy color keyss, but very advanced technology.
Price like gold, speed like a beast, ahead of the others...
I considered once to get one Archimedes, but hey, I cannot spent any time on my Amiga to get it work happy So it would be just the next "garbage" to my lame MUPID-2.
ON
I think it was a miracle, that they could put together the Z80, the 8502 and VIC-II, knowing, that VIC-II can tolerate only 1MHz. (Z80 and VDC could be an interesting option together). Z80 was slew down to 2Mhz, as far as I know.

UPDATE:
I just recently referred your speed test (with all the copyright information :-) ) on Ha'wangarda festival in Poland
Mainly ZX Spectrum and Atari owners/fans were listened, and poeple like the retro art. Because these two machines were not listed, I survived the presentation :-D
(in reality it was a fantastic meeting, though the technical details about Plussy was a little too much for some of the isteners, but after that I got a lot of questions, and few statment, how much this machine is underestimated by the scene)

Posted By

Litwr
on 2016-10-10
13:51:59
 Re: A mathematical demo

Atari was very popular in Poland of 80s. I could even think then that it was more important producer of home computers than Commodore. happy
My demo uses small but hard codes so it may be interesting for Atari on Speccy users to convert it for their platform. C+4 is the fastest... but I have doubts about Dragon-32/64. 6809 at 1.8 MHz may outperform 6502 at 2.2. Dragon may also use 6309 which should be definitely much faster than C+4 but it is not standard (like Acorn 2nd processors for BBC Micro).
My demo shows that C128 z80 effective frequency is about 1.6 MHz. I used Vice. So if Mirko (MIRKOSOFT) reads this message please run pi-demo with the real C128 iron. Maybe Vice emulation for rarely used z80 mode is poor. BTW Amstrad CPC z80 has effective frequency about 3.2 Mhz.

Posted By

MMS
on 2016-10-10
15:49:22
 Re: A mathematical demo

Hi, yes, they are convienced about that in Poland, I was challenged several times I had to react on.
I think it is the right time to reactivate that nice Atari - Commodore "friendship" (just kidding)

C128 Z80: I think you are right. I read some more details from Bill Herd on a C64 forum on this subject. If I remember well, the hacked in Z80 was required to able to start up the machine (he just mentioned it in the video too, just linked by Luca) but later on due to the 1Mhz speed limit of VICII, the Z80 could not exceed in practice the 2MHz. It is strange, because the Z80 run in CP/M mode, and it required the VDC, so why the hell VIC required worked at all? Or I am just not remember well (you know: age. just reached 43 happy )

OFF
Well, probably I saw the world's most advanced Atari 130XE machine there.
It had the new advanced GFX card, 1MB RAM, SID card, 2.5" HDD, SuperCPU at 17MHz, port expander... all in one small 130XE housing.
Was kind of shocking.

Posted By

Litwr
on 2016-10-18
12:44:23
 Re: A mathematical demo

The most sophisticated hardware I ever met is a matchbox sized "matchbox" - http://mdfs.net/Software/Tube/Matchbox/
The next video shows its z80 at 112 Mhz (!) with the mathematical demo.

Happy birthday! I want to be 43 too. happy Some ppl here are above 50... Some ppl can work even at 90...
I spent some time with 6809 datasheet. IMHO 6502 maybe faster... All Motorola chips are a bit too theoretical and with the good features they are a bit clumsy for the practical tasks
EDIT. Pi-pack 18 is just released. It contains the newer faster versions for 6502 based machines and a version for 6809 based Dragon-32/64. So it is possible to compare 6809 and 6502 at their edges. The results show that 6809 at 1.78 MHz matches 6502 at 2 MHz. The second accumulator gives a big advantage to 6809. However (IMHO) 6809 was overestimated... So C+4 is still the fastest. :)

Posted By

MMS
on 2016-10-18
14:40:40
 Re: A mathematical demo

Good to hear. I referred Plus/4/C16 on Ha'wangarda festival as the 8 bit speed demon grin

Posted By

Litwr
on 2017-02-06
03:29:09
 Re: A mathematical demo

My obsession continues, pipack-23 is ready. C+4 is presented by version 10. IMHO it is the most optimized program for plus4. A program for ZX Spectrum 48k is added. I expected that Speccy would be 10% faster than Amstrad but it shows only 2% speed gain. Maybe emulators are not quite accurate. BTW I didn't meet ZX until 2017. It was a bit shocking that its firmware forbids to use IY register. It is odd that so poor hardware and firmware were spread so widely.

Posted By

MMS
on 2017-02-06
17:50:16
 Re: A mathematical demo

Cheap, so ok. Even no sound output on 48k just beeper. But i like it. happy

Update:
maybe I was wrong.
The Specrum user manual states that the EAR can drive earphones, while MIC can provide Line Out you can cponnect to an amplifier.
Frankly speaking, I have never heard any 48K ZX Spectrum connected to an amplifier via line output, though there are some really fantastic (2-4 channels) musics created on this 1 bit sound machine too.
https://www.youtube.com/watch?v=QZnOd_f9YjQ<6url>

Posted By

JamesD
on 2017-02-09
13:39:48
 Re: A mathematical demo

> I spent some time with 6809 datasheet. IMHO 6502 maybe faster... All Motorola chips are a bit too
> theoretical and with the good features they are a bit clumsy for the practical tasks

In my experience, even he Motorola 6803 is faster per MHz than a 6502 when performing the same task, and it only has one index register. The 6809 is easily faster than the 6803.

Posted By

Litwr
on 2017-02-11
15:49:35
 Re: A mathematical demo

Do you mean 6800? It has two accumulators. It is a big advantage but other components are slower. IMHO 6502 is much better. 6800 was almost never used.
I'm sure that MOS Technology was capable to make an upgrade to 6502 much better than 6809.
BTW Our program database is a bit inflexible to work with program updates. :(

Posted By

MMS
on 2017-02-11
17:14:11
 Re: A mathematical demo

Very impressive list, by now! Congrats!

Actually one question: is it possible to show the Plus/4 reults with ON screen too? As I know, it is with OFF screen.
Thanks, pal! grin

Posted By

JamesD
on 2017-02-12
00:03:29
 Re: A mathematical demo

>Do you mean 6800? It has two accumulators. It is a big advantage but other components are slower. IMHO >6502 is much better. 6800 was almost never used.

I meant 6803.

https://www.youtube.com/watch?v=51fE-cdiG3g

>I'm sure that MOS Technology was capable to make an upgrade to 6502 much better than 6809.

Woulda, coulda, shoulda

Posted By

Litwr
on 2017-02-13
11:04:04
 Re: A mathematical demo

@JamesD I have found it - http://datasheets.chipdb.org/Motorola/mc6801_3.pdf. It was used with very rare MC-10. It is interesting that this CPU has 64 bytes of flash-RAM. It is an unusual feature for so old chip. 6809 shows a bit better speed than 6502 and 6803 is slower than 6809, so 6502 should be very close to 6803 in performance.
@MMS Thanks. happy C+4/PAL is about two times (more precise - 93%) slower in the standard screen on mode. My sheet shows the maximum speed possible.

Posted By

MMS
on 2017-02-13
15:55:54
 Re: A mathematical demo

Actually I think the only 8bit CPU that was properly upgraded and still have some potential real life usage is the Z80.

While 6800 was upgraded to the more complex 68000, the Z80 has a derivative that has the speed and the simplicity ( eZ80), not to mention the pipeline command processing.
(but there were not so good Z80 derivatives, like the almost never used 16bit Z8000, or z80000 (i386 equivalent)
Motorola 6800 likne died out with 60060, and PowerPCs are a kind of different (but my Xbob360 is using a kind of derivative of that PowerPC line-up)
MOS 6502 has the 65816 (still avauilable), and would be nice to compare it to eZ80.
Too bad, that I cannot name a single computer utilizing the 50 Mhz eZ80, to compare this 8 bit beast to the mentioned CPUs or to 16 bit 386DX line happy

Posted By

JamesD
on 2017-02-14
08:03:38
 Re: A mathematical demo

Please pardon the long winded response.

The 68000 is not an upgrade to the 6800, it is completely different.
The Z8000 is not a derivative of the Z80, it is completely different.

The 68000 still lives on in the Coldfire series of chips, and the Apollo FPGA core which is many times faster than a 68060. The Vampire series of Amiga accelerator boards are based on the Apollo.

The Z8000 is pretty much a distant memory even though it was technically better than the Z80. A pipelined version would have been pretty fast. A pipelined version would be very RISC like.

The 6809 is not an upgrade to the 6800 either, it is partially source compatible with the 6800 though and is easy for a 6800 programmer to adapt to.
The Hitachi 6309 is an upgrade to the 6809 that adds an instruction prefetch, additional instructions, a divide, and some 32 bit support.

The 6801/6803, IS an update to the 6800, but it is a microcontroller with built in hardware.
It will run 6800 code that doesn't conflict with the built in hardware.
It optimizes many instructions for faster execution, adds the D register which combines A & B to form a 16 bit register, it adds a multiply instruction, and it simplified hardware design requirements.
It predates the 6809.
The 6803 was used as the main CPU in the MC-10, and Alice systems. It also served as a controller for every device that attached to the Adam, and served as a keyboard controller in some Thomson systems
The Hitachi 6303 is a slight upgrade to the 6803, it offers some new instructions.
The 6801/6803 was updated with the 68HC11 and 68HC12, which added some instructions and additional index register(s).

The 6800 was used heavily as a controller, especially with the automotive industry.
It was more of a competitor to the 8080, and was the basis of systems based on the Flex OS. Flex is pretty much CP/M for the 6800. It had simpler power requirements than the 8080 which made it cheaper to design for.
The APF Imagination machine was based on the 6800, as was the Panasonic JR-200, though the JR-200 uses a clone of the 6802 which has a few of the 6803 enhancements.
The automotive industry moved on to the 6809, 68HC11, 68HC12, etc... and derivatives of these are still used in some devices today, though they may be FPGA or custom ASIC based.

As for Z80 derivatives...
The Z280 has a 16 bit ALU and is pipelined. It was originally the Z800 which was announced in the 1983/4 data book, but there were some problems with it, and it didn't make it to market until 1987 with a name change.
Tthe R800 in the MSX Turbo R was supposedly derived from the Z800, and it's pretty fast.
The Hitachi HD64180 was the first real Z80 upgrade available in 1985l. It was licensed back to Zilog as the Z180.
A 6 MHz 64180 is supposed to run the same speed as an 8 MHz Z80.
The Z80 has a 4 bit ALU, but the 64180 is 8 or 16 bit, I forget which. It also has an instruction prefetch, a multiply, built in ports and timers, built in DMA, and the ability to address 512K of RAM using an MMU.
The Z180 is slightly more compatible with the Z80 from an interface design standpoint, and is available in 30+ MHz versions to this day.While it is a microcontroller with built in ports, you could set the address of the ports so they wouldn't conflict with existing hardware. At least if one of the selectable blocks was not taken.
Several CP/M systems were based on the 64180, as was one MSX machine (it also had a Z80 for compatibility). At least one of the CP/M systems was clocked at over 9 MHz... which would be about like a 12 MHz Z80. A later Z280 system was clocked at 12 MHz and would have been very fast for a Z80 system at that time.
Most modern Z80 derivatives use the additional instructions from the Z180.
The Z380 offered additional memory handling, multiple register banks, stack relative addressing, and a few other enhancements. I think it was aimed at multi-user machines running separate programs in their own memory. The stack relative addressing was certainly nice for compiler support, but I'm not sure any other CPUs support that.
The eZ80 is source compatible with the Z180, but it won't run Z80 object code
Other pipelined implementations from several manufacturers are also significantly faster, though most only use a single interrupt or other changes so they can't just drop in place of a Z80.
The fastest Z80 derivatives are probably Verilog or VHDL cores. A couple of those are supposed to be able to work in custom silicon at 200 MHz or more, but I've never seen a stand alone CPU that fast.

As for the 65816...
This Benchmark just measures one area of performance. It's an incomplete picture of what a CPU can really do. The mode switching required to go between 8 and 16 bits make it slower than the 6809 for mixed 8 and 16 bit code. I don't think it would help this benchmark, but there are clearly situations where it is faster than the 6502.
The Atari version of my 64 column graphics text code with a 65816 scroll is about the same speed as the 3.5 MHz Z80 in the VZ200. The scroll loop that hurt the 6502 vs the slower clocked 6803, now moves 16 bits at a time on the 65816 and it's higher clock speed suddenly shows.

The 16 bit support, stack relative addressing, and larger stack make it much better for compiled languages.
One case I can think of where the 65816 would be much faster than the 6502, is Apple Pascal. Apple Pascal is actually UCSD Pascal (portable to about any machine with 64K RAM). The P-Code interpreter is heavily oriented towards 16 bits (as is Pascal), and the 65816 would require significantly fewer instructions than the 6502.
The larger memory support of the 65816 would also allow a full 64K to be dedicated just for code, and the larger stack could support a lot deeper recursion. Additional code modules could be cached in the rest of RAM and would be much quicker to access than paging RAM with the 6502.
I would think an overall speed improvement of 20% or more would be very likely, and 50% might not be out of the question... but I wouldn't bet on it.

Posted By

Litwr
on 2017-02-14
11:19:57
 Re: A mathematical demo

Thank you very much for both comprehensive and concise information. Pi-spigot shows that the code for 65816 is much (about 45%) faster than the code for 6809. 6502 is slow for a memory copy and fill so the scroller shows 6502 in the worst circumstances. If we do more complicated operations with a big amount of memory then 6502 is faster than 6809.

Posted By

JamesD
on 2017-02-14
20:52:22
 Re: A mathematical demo

What speed is the 65816 clocked at?

Posted By

Litwr
on 2017-02-15
11:56:41
 Re: A mathematical demo

Let's analyze some data. SuperCPU's 65816 is clocked to 20 MHz. It makes 3000 digits of π for 87 seconds (in PAL system). So the reciprocal of the normalized efficiency of CPU for 1 Mhz is 20*87=1740. 6809 at Dragons and Tandy Color computers is clocked to 1.78 MHz and it takes 1414 seconds for 3000 digits. So the reciprocal is 1414*1.78=2516.92 The latter value is about 45% bigger.
I add data for some other CPU:
6502 – ≈2800
z80 – ≈5000
80286 – ≈500
ARM1 – ≈350
8088 – ≈2000
68000 – ≈1200

Posted By

JamesD
on 2017-02-15
12:37:23
 Re: A mathematical demo

The Dragon can only clock the CPU at 1.7 MHz when the ROM bank is accessed. That is how the SAM chip was designed. The actual benchmark is running at half that speed.
If the benchmark is using the ROM to print, then the print is running that fast, but nothing else.

Posted By

Litwr
on 2017-02-15
13:50:26
 Re: A mathematical demo

The data in the table is for Dragon executing pi-demo in the fastest screen off mode at 1.78 MHz.

Posted By

JamesD
on 2017-02-15
16:18:18
 Re: A mathematical demo

I told you what that means.
This is from a CoCo page, but the Dragon is basically a clone of the CoCo. This is controlled by a chip called SAM, Synchronous Address Multiplexer. aka the MC6883

"POKE 65495,0
doubles speed of ROMs; Basic programs by 1 and a half. Does not work with all computers, or with disk, cassette or printer operations."

RAM is still at .89 MHz

The behavior is mentioned on page 11 of the datasheet.
The machine starts up in "Slow" mode.
The POKE enables the A.D. (address dependent) mode where high speed is enabled for ROM.
There is "Fast" 1.7 MHz mode, but it disables DRAM refresh and display updates.
In Fast mode, the machine locks up within seconds unless you modify it to have SRAM instead of DRAM... something I've never seen anyone do.

The CoCo 3 can run at a full 1.7 MHz since it's based on a gate array designed to do it.
Unless the benchmark was performed on a CoCo 3, it wasn't a full 1.7 MHz


You can find the data sheet here:
https://ia801708.us.archive.org/22/items/Motorola_MC6883_Synchronous_Address_Multiplexer_Advance_Sheet_19xx_Motorola/Motorola_MC6883_Synchronous_Address_Multiplexer_Advance_Sheet_19xx_Motorola.pdf

Posted By

MMS
on 2017-02-15
16:42:13
 Re: A mathematical demo

hi James,
many thanks for the detailed information, I learned a lot of new info from that.
Only one question on eZ80 (my "fav", yepp), I copy if from it's user manual (I read it in the past and that's why I found it so interesting beforehand too):

"The eZ80 CPU’s instruction set is a superset of the instruction sets for the Z80 and Z180
CPUs. The Z80 and Z180 programs are executed on an eZ80 CPU with little or no modification."
"The eZ80® CPU is capable of operating in two memory modes: Z80 mode and ADL
mode. For backward compatibility with legacy Z80 programs, the CPU operates in Z80
MEMORY mode with 16-bit addresses and 16-bit CPU registers."

So, as per the description, in the default mode the eZ80 CPU is almost completeley equivalent to a Z80, just much faster due to the CPU pipeline.
In practice, it is not true? Just a promise?

Thanks in advance.
(just for read, no followup: http://www.shaels.net/index.php/mic80/mic80-general/38-mico-overview
is states, that the CP/M Pascal runs 30x faster on the 50MHz eZ80 based compurer than on a 4MHz Z80)

As an other stated, the Raspberry Pi killed the eZ80's market, as it is much cheaper and more flexible than the eZ80 could ever be.

Posted By

JamesD
on 2017-02-15
19:37:21
 Re: A mathematical demo

Ugh... huge mistake on my part.
The ez80 *is* binary compatible with the Z80/Z180.
It's the Rabbit that is only source code compatible.
Du-Oh!

*edit* I somehow deleted part of my reply. The other ez80 memory mode uses more than 16 bits for addresses, and it requires differences to the code, but I haven't used it, I've only used the HD64180.

I did a little reading on the Z280. It has a supervisor mode where the R800 does not.
I don't know if the R800 is a stripped down Z800 or if the Z280 is a souped up Z800.

So... if the ez80 is 30 times faster than a 4 MHz Z80... 30 x 4 = roughly the equivalent of a 120 MHz Z80.
I don't know what the comparison the pipelined Z80 cores I mentioned would be vs Z80, but since they are faster at the same speed, and can be clocked faster than 120 MHz in a custom ASIC, they must be faster than the ez80. But then there's no off the shelf part for them.

ARM, Coldfire, PowerPC, Spark, Arduino.... so many cheap microcontroller options with better compiler support than the old 8 bits.

*edit*
50 MHz ez80 CP/M system.
http://noplabs.com/cpm50/cpm50.html

Posted By

Litwr
on 2017-02-16
11:39:57
 Re: A mathematical demo

Maybe Dragon and Tandy Color have some differences. I can give a citation from The Dragon 32 Dragon Companion by M.Jarvis.
"
PROCESSOR SPEEDS
Another feature of the Dragon which is very interesting, and indeed can be
extremely useful, is the fact that it has a variable processor clock rate. For those of
you who do not understand about clock rates a simple explanation is that the
central processing unit (in this case the 6809E) receives a regular tick from a timer
which tells it to move on to the next stage of obeying an instruction. These ticks
are measured in megahertz (millions of ticks per second) and the faster the tick the
laster the computer works. ln the Dragon's case the clock rate is controlled by
SAM bits in locations 65494 to 65497. These four locations control two SAM bits
which should give us four clock rates. Table 5 sets out the functions of the
locations.
...
Table 5.
Two bits should give four speeds but the Dragon appears only to respond to three.
The default speed, set on switching on, is the slowest with both SAM bits cleared
(poking any value to 65494 and 65496 achieves the same effect). The next faster
speed is set by poking to 65495 and results in the execution of programs being 50%
faster. The slowest two speeds are the only ones which can be used and still retain
video synchronisation.
The next increase in speed is achieved by setting bit 1 and clearing bit 0. Execution
speeds are 100% faster than the default speed but video synchronisation is lost.
This speed would be useful if a program involves large amounts of computation
and where video is not important. Video synchronisation can always be regained
by slowing the processor down again after the burst of computation.
The final speed should be achieved when both bits are set but as I have said, the
Dragon does not appear to respond (see the example program).
One thing to remember about the faster speeds is that the cassette interface will
only work at the default speed"

BTW the wikipage mentions 1.79 MHz, not 1.7
I agree that modern electronics is a kind of paradise...

Posted By

JamesD
on 2017-02-16
04:49:15
 Re: A mathematical demo

That would have required a new SAM chip or SRAM instead of DRAMs..
So no.

I've owned a CoCo since 1982 and a Tano Dragon for several years.
Many Dragons don't even work in the A.D. mode let alone the Fast mode.
Only the ROM code is running at 1.7 MHz

I say 1.7 because I'm being lazy.
It's 1.7897725 for Tandy and Dtagon 32s, 1.785??? for Dragon 64s.
Yup... Dragom 64s are slower.


Notice the comment you quoted
"The final speed should be achieved when both bits are set but as I have said, the Dragon does not appear to respond"

Posted By

Litwr
on 2017-02-16
12:06:10
 Re: A mathematical demo

So this book gives false information. What is wrong with 50% speed up mode?
Pi-demo may refresh DRAM by itself. Could you run pi-demo with your Tano Dragon? You may use archive.worldofdragon.org General topic for a reply. If some modification to the demo is required then you may use a direct PM to me. I can also try to make a special version for a CoCo. Thanks in advance.

Posted By

JamesD
on 2017-02-16
15:35:38
 Re: A mathematical demo

If they ran it at 1.7 MHz, it was on an emulator. Since the POKE in the source is the one to set the Fast mode, that's what they would have had to do.

One thing I do find troubling is that you choose to use lots of tables for the 6502/65816, but you don't offer similar optimizations for other CPUs. It tends to bias the result as the two approaches are very different.

Posted By

Litwr
on 2017-02-17
03:25:49
 Re: A mathematical demo

I used Xroar. Why are you so sure that a genuine Dragon can't run at 1.7MHz? Why the idea about self-refreshing code looks wrong to you?
pi for 65816 (and z80) uses tables for multiplication because 65816 has no hardware multiplication. 6809 has hardware multiplication and it is slower with tables than 6502. 6502 uses also tables for division but it gives almost nothing -- less than 1% for 3000 digits. My aim is to make the best possible code for every CPU. If you have any idea how to improve the code for 6809 please share it. However it will be more right to discuss this matter at 6809 dedicated forum.
BTW 6502 table multiplication is a bit faster than 6809 hardware multiplication. 6809 is faster with division but 65816 is more fast because it has 16-bit accumulator too. However 6309 is very fast, it is 2.6 times faster than 65816 with pi-spigot. I am curious why Japanese didn't make fast 6502 too? They made fast z80 (R800), fast 6809 (6309), fast 8088 (v20), fast 8086 (V30) but missed 6502. :(

Posted By

JamesD
on 2017-02-17
10:47:26
 Re: A mathematical demo

> I used Xroar. Why are you so sure that a genuine Dragon can't run at 1.7MHz? Why the idea about self-refreshing code looks wrong to you?

I'm sure because someone spent weeks trying self refreshing code.
That doesn't mean it's not possible, just that we couldn't get it to work.
YMMV


>I am curious why Japanese didn't make fast 6502 too? They made fast z80 (R800), fast 6809 (6309), fast 8088 (v20), fast 8086 (V30) but missed 6502.

Hitachi was a 2nd source for chips from several companies at that time. That may be why you see support for certain chips and not others.
MOS was owned by Commodore, and Synertec was a 2nd source already.
Western Design had Rockwell as a 2nd source. Rockwell added additional instructions to the 65C02.
Plus, the 6502 was cheaper than the other chips, negating the need.

There probably wouldn't be a faster part for the 6502 anyway. It already had a prefetch and short instruction cycle times. The prefetch alone speeds up the 6309 and 64180 by a clock cycle for almost every instruction. Not sure how they could have added a multiply unless they used A and X.
They microcoded the HD64180, added a prefetch, and used a wider ALU. Speed optimization was easier to come by there.

Hitachi also had a 6303. It's a drop in replacement for the 6803... though I vaguely remember 1 pin possibly being slightly different. Nobody's found a native mode switch for it like the 6309, though I'm not sure anyone has even tried to find one. It it exists, it might add a prefetch.

I might get around to looking at the 6809 code, but I have a lot of other things I need to do first.

Posted By

JamesD
on 2017-02-17
16:29:16
 Re: A mathematical demo

I took a glance at your 6809 code.
Many instructions dealing with index registers requires 2 byte opcodes.
Y is the worst for this.
So you want to use X for indexing the most often.
U oriented indexing is also 1 byte opcodes, so use it before Y when you can.

Posted By

Litwr
on 2017-02-18
01:56:02
 Re: A mathematical demo

4510 showed that it is possible to make 6502 up to 30% faster. They could also skip a lot of 65C02 poor extensions and add the second accumulator, multiplication, division, ... You mentioned unreleased Synertec 6509 which might be as fast as 4510...
[6509] Thank you for some work with my codes. However it is a bit off topic to discuss 6509 programming at Commodore+4 forum. There was a Commodore with 6809 but it is an impossible rarity.
Your notes about Y register are not correct. It is always used in a context where it is so fast as X in the main loop between labels 'loop2' and 'l4'.
You know that there are several programs mentioned at the Dragon forum that work right at 1.7 MHz. They demonstrate the self refreshing code. Pi-spigot maybe self refreshing too.

Posted By

JamesD
on 2017-02-18
07:36:57
 Re: A mathematical demo

What is appropriate about discussing the 6809, is that the tread is about benchmarking the 6502 vs the other processors. Your 65816 code is certainly faster than that 6809 code... but that 6809 code misses simple optimization that is commonly discussed.

> Your notes about Y register are not correct. It is always used in a context where it is so fast as X in the main loop between labels 'loop2' and 'l4'.

One additional clock cycle for the extra byte, look it up.
2 byte opcodes extracted from the table at the following address.
http://techheap.packetizer.com/processors/6809/6809Instructions.html

Notice that X is not mentioned in the 2 byte opcodes.
What really sucks, is that CMPD is also a 2 byte opcode.
I would have traded 3 of those with SWI, SYNC, and TST since those aren't used as often.

 +-----------------------------------------------------------------+
| Page 1 Instructions^ |
+------------+-------------+--------------+---------------+-------+
| Opcode | | Addressing | | |
| Hex Dec | Instruction | Mode | Cycles Bytes | HNZVC |
+------------+-------------+--------------+-------+-------+-------+
...
| 1083 4227 | CMPD | IMMEDIATE | 5 | 4 | -aaaa |
| 108C 4236 | CMPY | IMMEDIATE | 5 | 4 | -aaaa |
| 108E 4238 | LDY | IMMEDIATE | 4 | 4 | -aa0- |
| 1093 4243 | CMPD | DIRECT | 7 | 3 | -aaaa |
| 109C 4252 | CMPY | DIRECT | 7 | 3 | -aaaa |
| 109E 4254 | LDY | DIRECT | 6 | 3 | -aa0- |
| 109F 4255 | STY | DIRECT | 6 | 3 | -aa0- |
| 10A3 4259 | CMPD | INDEXED | 7 | 3 | -aaaa |
| 10AC 4268 | CMPY | INDEXED | 7 | 3 | -aaaa |
| 10AE 4270 | LDY | INDEXED | 6 | 3 | -aa0- |
| 10AF 4271 | STY | INDEXED | 6 | 3 | -aa0- |
| 10B3 4275 | CMPD | EXTENDED | 8 | 4 | -aaaa |
| 10BC 4284 | CMPY | EXTENDED | 8 | 4 | -aaaa |
| 10BE 4286 | LDY | EXTENDED | 7 | 4 | -aa0- |
| 10BF 4287 | STY | EXTENDED | 7 | 4 | -aa0- |
...
+------------+-------------+--------------+-------+-------+-------+



The whole self refreshing thing is a bit quirky. As long as you access a block of RAM often enough, it gets refreshed. Any block you don't access often enough goes bye bye. What size that block is depends on the DRAM chips used. So, the benchmark might keep running, but the long interval between screen writes as primes get further apart might cause the screen RAM to take a dump.
Now, you might refresh the screen RAM by writing to a byte at regular intervals... but then that would be reflected in your benchmark.

Instead of arguing with me, just try running it on a real Dragon. If it works, great, if it doesn't... bummer, it's not the end of the world.


The thing about the 4510, 6509, 65816, etc... is that they don't speed up existing code like the 6309 and 64180. The 6309 and 64180 give you a 20-30% speed bump on *existing* code, which is what I meant but I didn't say it, so point taken.

I finally found the rest of the 6509 info btw. It was missing something I thought it was going to have, so... not quite as nice as I expected.
If you extract the best additions from the 4510, 6509, and 65816, you would have a pretty decent chip for sure. Relocatable code, movable direct page, multiply, stack relative addressing, 16 bit support, access to more than 64K (even though I'm not thrilled with how the 65816 does it), memory move instructions...
It would certainly support high level languages better. That and CPU addressable memory are probably the biggest advantage 16 bit machines had over the 8 bit machines. If you design a 6502 upgrade where you can just compile your program and it works with more than 64K with no effort, that is what the 6502 needed. The 65816 can certainly do that... but all the paging and mode switching makes it a bit uglier than it should have been.

BTW, the 65816 memory move instructions suck! 7 clock cycles per byte vs 3 for the 6309 equivalent. Even with the 24 bit addressing it should have only been 4 clock cycles at most. I think even 5 would have matched the unrolled loop I use to scroll the screen in my code. That is definitely something that could have been sped up.

Posted By

Litwr
on 2017-02-18
09:27:43
 Re: A mathematical demo

We are discussing 6809 code optimization and the problem of DRAM regeneration of SAM based computers primarily. We don't discuss 6502 or even 65816 code optimization. So we are definitely OT here. In addition, our discussion relocated to Dragon forum may help to get access to genuine hardware.
Pi-demo uses only the next Y related instructions
ldb 1,y
ldb ,y
std ,y
subd ,y
leay -2,y
in the main loop. All of them have the same clocking with X.
I agree that the screen memory maybe corrupted but I am not sure. I don't have a Dragon and asked to help with genuine hardware.
4510 does a speed up about 25% with existing code. SY6516/6509 had to do the same speed up -- http://plus4world.powweb.com/forum/32844#32961
IMHO 65C02 and 65816/65802 were a shame. The upgrades might be much better if MOS Technology might survive.

Posted By

MMS
on 2017-02-19
08:31:32
 Re: A mathematical demo

OT as usual happy

What is the main roadblock to get higher speed 7501/8501 with the current technology + SRAM?

Bringing up the frequencies surely brings heat and uncertain operation, so the original technology do not let too much possibilities.
But lower fabrication technology, like 65nm will significantly increase the speed of gates and reduce the power consumption.
(SRAM is significantly (~50-100x) more expensive, but can provide 10 ns access times while current DRAMs has 60 ns, and we may not talking about gigabytes, but megabytes, mean 40-80 USD/MB cost for SRAM)

If you compare even the 130nm technology used ~15 years ago in Athlons64 VS HMOS/NMOS's 2 micrometer (2000 nm!) technology, it become evident, that with 100nm fabcrication process the 6510/7501/8501 could be a very very small and very fast CPUs allowing high frequencies.
(in fact Bill Mensch stated, that even with 2000nm technology he had few 6502s that could operate on 12MHz !)

Do you think, that with the available CPU documents, anyone would be ready for such a task? and how much it would cost, knowing how well known and well documented this chip is? (I know that FPGAs are far more flexible and available, so it is just a point of discussion, for interest)

Please do not by shy, tell me if I just told BS! happy

Posted By

JamesD
on 2017-02-19
20:21:41
 Re: A mathematical demo

Running a 6502 that fast isn't a problem. 65C02s that can be clocked faster, and they could be then.
RAM speeds that fast were more expensive though, and running the TED that fast might be a real problem.
Running the main system RAM at higher speed would also present a lot of problems.
However, you could use a 64K cache that runs full speed.
Look at the Zip Chip for the Apple II for an example of this, and the Apple IIc Plus which integrated the Zip chip onto the motherboard. With faster RAM, a faster CPU, and a faster clock oscillator, the IIc Plus has been run at 10 MHz. Some people may have run it even faster.

Posted By

Litwr
on 2017-02-22
17:06:12
 Re: A mathematical demo

I read somewhere that Bill Mensch said that it is possible to make 6502 running at more than 10 GHz...
BBC Micro users can use 6502 at 200 and more MHz - https://www.youtube.com/watch?v=JjUb8g2kED0

Posted By

MMS
on 2017-06-01
15:25:58
 Re: A mathematical demo

wow, so all these things are already developped, even more, was available in shops. Impressive!

Posted By

Litwr
on 2017-06-01
07:54:37
 Re: A mathematical demo

Updated.
OT. Can we imagine the MOS Technology domination - an alternative reality?



Back to topReply to this topic


Copyright © Plus/4 World Team, 2001-2024