|
Posted By
Luca on 2020-11-01 08:52:22
| Re: ASCII Art Mandelbrot
Heh, the V2 BASIC is faster for sure: Commodore has bought the Microsoft BASIC, all the further versions are basically the same BASIC pumped with new commands, but if your interpreter is the same, you simply give more duties to check for it at the same rate. Overall result: slower :-/
|
|
Posted By
MMS on 2020-11-01 09:24:38
| Re: ASCII Art Mandelbrot
My 2c: probably the C128 will be even slower than +4 because of the same reason.
C128 is faster only in the FAST mode when you see nothing on the screen.
On the other hand (referring to the original topic), speed of +4 BASIC is impacted by by continuous ROM/RAM switches to provide more RAM, while on C64 it is fixed (but has much less free RAM for BASIC programs). The same is valid for C128. So, BASIC simpy does not show the speed of the machine, but the speed of architecture with BASIC interpreter. Worth to mention, on Commodores the drivers/kernal/Basic did not take away RAM from the user. (BBC Micro B offered 18K if gfx mode 6)
|
|
Posted By
Litwr on 2020-11-01 12:54:59
| Re: ASCII Art Mandelbrot
@MMS you are right. I have added results for the C128, it is 3 (!) times slower than the C64! And I agreed, the BBC Micro and Amstrad CPC have too little memory for their Basic.
|
|
Posted By
Gaia on 2020-11-01 13:53:53
| Re: ASCII Art Mandelbrot
Hi Litwr, nice to see you One of the primary reasons why the +4 BASIC is so much slower is because on each memory read it jumps to the $04xx area. On a C16 this could be patched since there is no overlap for the RAM and the ROM.
|
|
Posted By
MMS on 2020-11-01 14:11:08
| Re: ASCII Art Mandelbrot
Yeah, it is really hard to get the proper solution on an 8 bit machine, as least without a standardized and fast communication bus (like PCI or AGP).
When you have a shared memory between the CPU and video card (and sound card) you can do tricks and reallocate memory address, so you can create fake animations, just by simply changeing character sets or gfx memory address. On the other hand, you loose cycles, like we have badlines, anso C64 has some. The SAM Coupe ("256KB 8 bit ZX spectrum compatible Amiga killer machine upgradeable to 4MB, 6 channel stereo plus ") is a perfect example, even with a 6 MHz Z80 CPU it was only 15% faster than ZX Spectrum, and had troubles to manage gfx screen upgrade. AFAIK the MSX machines with similar Z80 CPU had more success, had a lot of nice arcade conversions (but GFX wise the first MSX1 series was not so great, but MSX2 was impressive).
If you have separated memory for GFX and sound (like VDC in the C128), you may greatly suffer if you want to load massive data through few registers. Certainly, if you have the fast communication channel (like the VGA or sound cards had even via old PCI connection) you do not feel it, but C128 was sluggish with the VDC in graphics mode.
Actually you can speed up the Amiga 1200 to the double just by addig Fast RAM to the shared Chip RAM. So the answer is that the separated RAM is better, but for that you required a fast internal bus for data transfer for the special ICs.
|
|
Posted By
Litwr on 2021-03-03 12:10:45
| Re: ASCII Art Mandelbrot
Sorry I made a type error when ran the program on the C64. So fixed results are
BBC Master (mode 7) | 111.95
| BBC Master (mode 6) | 112.04
| BBC Micro B (mode 7) | 144.96
| Amstrad CPC 6128 | 163.43
| Commodore 128 (fast) | 297.57
| Commodore 64 | 384.16
| Atari 800XL | 394.12
| Commodore +4 | 485.85
| MSX2 | 554.98
| Commodore 128 | 620.20
| TI-99/4A | 757
|
The plus4 is not so bad now, and the C128 looks better too.
|
|
Posted By
MMS on 2020-11-01 16:35:27
| Re: ASCII Art Mandelbrot
Gaia, thanks for the info. Is there any way to run the same program on a patched C16? What is the method? Would be great to see the speed in a similar setup as a C64. (no bank switching at every command)
|
|
Posted By
gerliczer on 2020-11-02 02:21:59
| Re: ASCII Art Mandelbrot
Hi Litwr,
Could you please convert to and run on the Enterprise 64/128 and Videoton TVC Computers this test?
|
|
Posted By
MMS on 2020-11-02 13:44:50
| Re: ASCII Art Mandelbrot
Litwr, thanks for the update. MSX2 shocked me a little, as it had a special Yamaha video chip with it's own memory, so it had a Z80 with 3.58MHz with no "badlines" (I suppose)
|
|
Posted By
Csabo on 2020-11-02 18:28:45
| Re: ASCII Art Mandelbrot
Hmm, I was surprised to hear that the Plus/4 BASIC is slower.
I did a little informal test, I checked two things:
"Disabling" $0312 and $0314 lead to about 3% speed increase. I guess this is a pretty generic trick that any BASIC program could use, if it's doing extended periods of computation.
Replacing the routine at $4B0 so that it doesn't do any paging required that I move the BASIC end to $7FFF. This way it never needs to read from the RAM under the ROM, so no paging is ever necessary; and this can still work for some smaller programs. However, it only lead to an additional 1% speed increase. So... I would say it's not the paging that's the main culprit.
|
|
Posted By
Litwr on 2020-11-03 12:58:08
| Re: ASCII Art Mandelbrot
Good news! Thanks to Gaia's idea I have made FBI+4 utility which speeds up the C+4 Basic and makes it matching C64, BBC Micro, or Amstrad CPC Basics. The results for ASCII Mandelbrot is about 35% faster now! It is 361.02s, it is faster than the C64! It is a bit odd that nobody wrote such a simple utility to this time. Using this way optimization allows us to easily fix the C+4 bugs (like the RS-232 bug), optimize firmware, add new features to it, increase size of RAM available for Basic, accelerate Basic more (faster graphics, io, ...), ... We can completely rewrite ROM and load its functions as a program. @MMS The MSX2 uses extra cheap and slow DRAM chips which require wait states from the Z80. So the MSX2 is generally matching the Amstrad CPC performance, but Locomotive Basic was newer and much faster than MSX Basic. @gerliczer Sorry I know little about this hardware. The program is freely available in text. If you have those machines or their emulators it can be quite easy to ran this program there. BTW I still don't know how to run this program on the Atari 8-bit. It would be quite interesting to get results from there.
|
|
Posted By
George on 2020-11-03 13:04:08
| Re: ASCII Art Mandelbrot
@litwr: Great tool. I tried it with my 3D Engine 2.0 (v2.5) in Yape with full speed up, but actually it was even a second slower with the tool. Any ideas why?
|
|
Posted By
Litwr on 2020-11-03 13:13:20
| Re: ASCII Art Mandelbrot
@George Your results are rather impossible. It seems that you ran ML program and got a random timing difference. FBI can't slow down, it can only accelerate. However in some cases when you use complex function calculations and graphic output, the acceleration may be very tiny. Indeed I am ready to thoroughly test your case, just send me instructions how to run it to get timings. EDIT. I have just run all small benchmarks from https://en.wikipedia.org/wiki/Rugg/Feldman_benchmarks - every case gets 5-26% speed gain. The lowest 5% speed-up has been reached for program #8 with trig functions.
|
|
Posted By
George on 2020-11-03 13:18:18
| Re: ASCII Art Mandelbrot
@Litwr: Yes should be impossible. I used Yape 1.1.6 Its a full Basic programm with heavy use of sequential files and graphic. No ML included.
* Download the v2.5 version * Extract the Folder * Put drive 8 in iec level\performance mode, and set to the extracted directory * Put yape in full trottle * copy&paste the code from 3dengine25.bas into yape * When the "!" appears after all dots are drawn, press "L" and the render starts
Thats it!
|
|
Posted By
Litwr on 2020-11-03 14:18:19
| Re: ASCII Art Mandelbrot
It works! I am really impressed much. However it doesn't print timings. So I can't check them. However I have just got an idea what can make the program execution slower. It is the string garbage collector. When we have less memory it has to work more often and this causes tiny slowdown. However I still can't believe that this effect can be more than advantages for speed provided by FBI+4. We need exact timings to compare.
|
|
Posted By
RoePipi on 2020-11-03 15:00:56
| Re: FBI+4
Wow, it's awesome! I tested it with two games of mine which use the TI variable for timing, Foxish and Bit Fox. The latter is always slow, but I felt a definite speed up playing Foxish (This is where you hunt down preys playing a rhythm game.)
|
|
Posted By
George on 2020-11-03 19:34:38
| Re: ASCII Art Mandelbrot
@Litwr: Well i have to correct myself (i measured by hand the previous time) Rendertimes:
3502 secs (normal) 2573 secs (with Fast Basic) 1715 secs (with Fast Basic + screen off)
Thats about 27% faster... congratulations to your tool. The heavy downside is the memory loss!
|
|
Posted By
MMS on 2020-11-03 17:14:23
| Re: ASCII Art Mandelbrot
What a fantastic idea! I remember we spoke a lot in the past about the RAM/ROM switching loss, but this one is a real evidence. Thank you all for the efforts! I already tried it, unfortunately the GSHAPE and SSHAPE commands do not speed up too much probably the functions using a lot of local RAM calculations on the bitmap (CIRCLES and block copy commands) does not feel too much out of that, but those one have heavy FOR cycles or seevral subroutines will greatly speed up.
So it is fantastic stuff!
I think the compilers are also benefit a lot from the same effect, switching off the RAM/ROM switching, by directly adding into RAM the pre-written routinrd.
But is it really necessary to have such a big sized compiled BASIC code?
AFAIK majority of the compiled code size is the linked-in run library, but most of the cases it is exactly the SAME BASIC code, what is already in the ROM, without any major change, just calling the routines directly, without interpreter. (LITWR's compiler using a big boost on real integers and 3 byte long fast floats, but no gfx routines allowed)
So what if instead of adding a huge code library (could be necessary on Apple II or BBC). FBI is only 2KB, instead of adding the full BASIC ROM into the file. But it could add a speed increase to the graphical commands too. (Certainly the code size is less than 60K due to the upper RAM limit, but in case of compiled code it is known to have less RAM)
Or maybe I misunderstand something?
I hardly believe the compiled P-code (a certain kind of jump-table) should be much bigger than the original BASIC code. It is told, that P-code is more compact that the original BASIC code... So how the 7 block long BASIC code becomes a 38 blocks long compiled code? the BASIC routines added (but they are already in the ROM, but it could be copied into the RAM instead of adding it to the code as a ballast). Except it adds new functions, like LITWR's Compiler with real Integers and 3-byte long faster floats.
|
|
Posted By
Csabo on 2020-11-04 17:28:24
| Re: ASCII Art Mandelbrot
Here's what I was talking about above (disabling $0312/$0314). This gives roughly a 3% speedup, doesn't reduce the available memory and it can be used with Litwr's program. The downside is that keyboard handling, etc which is provided by those routines are not available.
0 FORI=1TO68STEP2 1 V$=MID$("A2548AA003D006A942A20EA0CE788D12038C13038E14038C150358600E09FF4CBEFC",I,2) 2 POKE824+I/2,DEC(V$):NEXT
A little test for it: 4 SYS824 5 FORI=0TO200:V=INT(I/3):PRINTV;:NEXTI 6 SYS831
SYS824 turns those routines off, SYS831 puts everything back to normal.
|
|
Posted By
Litwr on 2020-11-05 16:12:10
| Re: ASCII Art Mandelbrot
@Csabo Interrupt disabling disables TI too, so how can we check timings without them? It is odd that popular home computers produced in the SU didn't have timer support in system software - it was a large drawback. However your observation is interesting, the 3% means about 800 cycles per interrupt. Maybe it is possible to optimize interrupt handler routine, the 800 seems too much. IMHO SSHAPE is made so slow intentionally, they didn't want sprites on the C+4. Marketing is a crazy thing.
|
|
Posted By
MMS on 2020-11-05 16:43:27
| Re: ASCII Art Mandelbrot
Maybe they did not make it intentionally so slow. But a much faster would not hurt
I see they tried to realize a lot of functions. The worst thing is that you cannot predefine or save/load the string content, I mean not easily.
I am just checking the disassembled ROM code (by Mike Dailly) at $DB35 to understand why it is so slow Wow, so many jumps all over the memory
|
|
Posted By
Csabo on 2020-11-06 07:33:19
| Re: ASCII Art Mandelbrot
Litwr, I wrote a small test which played a short sound, did some calculations and played another short sound again. Then I recorded the output from YAPE to a WAV file and compared the length between the sounds. I think that was accurate enough
|
|
Posted By
SVS on 2020-11-06 15:21:54
| Re: ASCII Art Mandelbrot
@MMS: The P-code needs a so large executing code because it does really use its own code for the Basic commands. The constants are included inside the commands themselves, the variables are directly addressed, without any scan over the variable area. The integer variables are 2 bytes long (instead of 5). The formulas parsing is different so that integers work as integers without to be converted to FP. The structured branches are better too: Gosub block is smaller then more levels can be used. The For Next is optimized too. The above info is for Austrospeed. I don't deeply know other compilator, except MicroComp that anyway have just some Basic commands. Ah... if you disassemble the ROM from $BD35 pls supply me with the results for the next issue of UltimateMap!
@Litwr: nice work! Finally new tools for coders on the 2020 years! Regarding the SSHAPE command, I don't want to believe they had to slow it intentionally :-/ Too bad, I think they just added it as a last minute command (it is bugged, as known, maybe because no time to test)
|
|
Posted By
Litwr on 2020-11-07 01:11:07
| Re: ASCII Art Mandelbrot
Thanks for all remarks. I hope if somebody finds a way to improve the C+4 ROM he can use FBI+4 as a base for further development.
|
|
Posted By
MMS on 2020-11-07 03:13:54
| Re: ASCII Art Mandelbrot
SVS: yes, you are right the speed increase of a compiled code is much more than this. Especially using real integers is a greal plus.
|
|
Posted By
Litwr on 2020-11-08 04:47:49
| Re: ASCII Art Mandelbrot
I have just added data for the Atari 800XL. Its Basic was really slow. It is interesting that Atari enthusiasts have completely rewritten ROM (Altirra ROM Basic, 2014) and this Basic about 160% faster for the ASCII Mandelbrot! So if anybody rewrites Commodore ROM Basic it can become maybe 100% faster.
|
|
Posted By
SVS on 2020-11-08 07:44:19
| Re: ASCII Art Mandelbrot
Hey Litwr, have the compared data for the ZX81 too? IMO it is the second (!) best home computer ever made (just 8K of ROM including trigonometric functions and video generated by the CPU)
|
|
Posted By
Litwr on 2020-11-08 12:02:20
| Re: ASCII Art Mandelbrot
@SVS this Mandelbrot requires 38x21 screen at least. However this program is very flexible, we can set any screen size in variables H and W. Maybe I made a wrong choice when I took this size. Maybe it should have been 31x15. Even the Tandy CoCo has only 32 chars per line, so I can't test it too.
|
|
Posted By
MMS on 2020-11-08 15:02:15
| Re: ASCII Art Mandelbrot
@SVS: yeah, nothing limits this Z80 to perform (no music, no gfx )
|
|
Posted By
Litwr on 2021-03-03 12:16:55
| Re: ASCII Art Mandelbrot
I have just added data of the slowest result, it is from the TI-99/4A computer. It is interesting that its processor is 16-bit and rather fast. Jack Tramiel took revenge on Texas Instruments for his troubles in the 70s and killed this very unusual computer. IMHO it was the most unusual home computer ever.
|
|
Posted By
MMS on 2021-03-03 14:35:30
| Re: ASCII Art Mandelbrot
Thanks for the update. Did you know, that TI-99/4A has a cartridge with a fully working Dragon's Lair? Impressive (128MB ROM)
|
|
Posted By
Litwr on 2021-03-04 11:33:34
| Re: ASCII Art Mandelbrot
Thank you. IMHO it is possible to make such a ROM cartridge for the +4 too.
|
|
Posted By
SomeGuy on 2021-03-18 19:38:56
| Re: ASCII Art Mandelbrot
I've been comparing some old C64 books and Plussy memory maps.
I think one big stinker for the +4 BASIC speed is that CHRGET, the bedrock of parsing for the BASIC interpreter that gets the next characters, was moved out of zero page (where it lives in C64). It also has to deal with the banking of course too, in order to access RAM under ROMs on the +4, but I have to wonder why in the heck the Commodore guys wasted zero-page space with things like tempspace for "RENUMBER" or increment values for the "AUTO" line numbering, etc, when they did the 3.5 updates. Part of the problem is that owing to the switching of interrupts to deal with banking, CHRGET is also LONGER on the +4 which means its too long to fit in the 24 bytes of ZP wasted on being reserved for the "speech" functions ($d0-)
I encourage everyone to use MONITOR to get the disassembly for +4 CHRGET at $473 to $493.
Note the code disabling interrupts switching to RAM, then grabbing the next character, then back to ROM and enable interrupts. Every. Single. CHRGET it does this. Talk about a drag compared to C64 BASIC interpreter. Notice also there are three branches that could benefit from ZP addressing, which would save 3 bytes and add a bit of speed. Its actually more interesting than that, because the CHRGET on 64 (as disassembled in the "Mapping the C64" book) while mostly identical to the +4 version (except for the banking stuff) is also self modifying in that it actually stores the pointer to the next character in the operand to the equivalent of the LDA at $047F in the plus4 version. Doing the same trick on the Plussy still would not leave the routine small enough to fit in the ZP area wasted by being reserved for "speech" functions, largely due to the 8 bytes used by the code that cuts/restores interrupts and does the banking.
At any rate, that's GOT to be a lot of the problem in terms of the speed deficit on the +4: One of the most fundamental parts of the interpreter gets to do its work in ZP on the C64 and isn't burdened by the banking stuff. Not really sure what could be done about that, TBH. Still shaking my head on why little used (or not speed dependent) stuff like RENUMBER or AUTO have stuff clogging up ZP, not to mention the waste of space there for a speech functionality that was never actually released. Sigh.
|
|
Posted By
RoePipi on 2021-03-19 05:07:44
| Re: ASCII Art Mandelbrot
@SomeGuy You have a point there. We already observed significant speed increases (5-35%) using FBI+4. Yes, the actual situation is a waste. As a child, I got surprised when I tried out C64's BASIC after Plus/4. A bit dumb, but quick! I was thinking that it's because C64 knows a lot less commands.
So! We could re-write the interpreter, but couldn't maintain compatibility. We could run BASIC programs faster with optimized ROMs, but all the fellows with original hardware would be left out.
Myself I would prefer smart BASIC compilers, keeping compatibility and providing a larger scale of speed than possible with re-writing the ROM. Ones like Austrospeed Compiler (3-5 times faster runs!). My only wish is that it could handle most BASIC programs.
|
|