Login
Forum Help



Post Your Message
Username: (Login)

Topic:
Message:
 


Previous Messages
Posted By

Mad
on 2019-12-27
00:06:24
 Re: Assembly: fast plot and line algorithms for plus/4

Awesome! happy I did see this document somewhen ago.. Was pretty "flashed" by seeing code done by him! :D

Posted By

MMS
on 2019-12-24
13:12:16
 Re: Assembly: fast plot and line algorithms for plus/4

And here are the floating point algorithms (could be used for a C library) from noone else then the legendary Steve Wozniak. (Woz)

http://6502.org/source/floats/wozfp1.txt

PS
(did I mention, that for some time I worked together with a colleague at Philips Austria, who was Woz' realtive? Certainly called Wozniak too)

Posted By

raze
on 2019-12-10
15:48:46
 Re: Assembly: fast plot and line algorithms for plus/4

Hello, I wrote a small game in C that uses a fast routine to draw pixels (the fastest I can think of), I extracted the small code below, maybe you can find it useful. I didn't test it, but should be ok.
---------------------------

// Pixel read-write fast routine
// By Giuseppe Mignogna 2019
// Not tested, should compile under cc65:
// cl65 -O -t plus4 demo.c -o demo.prg

// TODO: The instructions to set up the
// graphic mode must be added

#define BITMAP_START 0x2000
#define UCHAR unsigned char
#define UINT unsigned int

UINT g_xb[320];
UINT g_yb[200];
UCHAR g_bt[8];

void init_draw_pixel() {
UINT i;

for(i=0; i<200; i++) g_yb = BITMAP_START + i/8*320; // i/8 -> int
for(i=0; i<320; i++) g_xb = i/8 * 8;
for(i=0; i<8; i++) g_bt = 1 << (7 - i);
}

void draw_pixel(UINT x, UCHAR y) {
UINT b = 0;

if(x > 319 || y > 199) return;
b = g_yb[y] + g_xb[x] + (UCHAR)(y&7);
POKE(b, PEEK( b ) ^ g_bt[x&7]);
}

UCHAR read_pixel(UINT x, UCHAR y) {
UINT b = 0;

// if(x > 319 || y > 199) return;
b = g_yb[y] + g_xb[x] + (UCHAR)(y&7);
return(PEEK( b ) & g_bt[x&7]);
}

int main (void) {
UCHAR y=0;

init_draw_pixel();

for(y=0; y < 200; y++) {
draw_pixel(100, y);
}
}

Posted By

KiCHY
on 2019-12-05
02:27:20
 Re: Assembly: fast plot and line algorithms for plus/4

...where worths mentioning I just converted the same routine I posted you earlier, from codebase64 happy

Posted By

George
on 2019-12-03
15:36:48
 Re: Assembly: fast plot and line algorithms for plus/4

I made an encyclopedia entry of Kichy's routine.
Thanky you Kichy!

Posted By

KiCHY
on 2019-12-02
17:27:31
 Re: Assembly: fast plot and line algorithms for plus/4

@MMS: There are quite a few floating point math libraries for the 6502. Check out 6502.org for source codes.

Posted By

George
on 2019-12-02
17:20:27
 Re: Assembly: fast plot and line algorithms for plus/4

A short feedback:
I suceeded to call the procedure from Basic after relocation the bitmap to $2000.
There is no essenatial speedup of my Basic-application.
I guess its because the Basic-Context (Integer Arrays, ...) and not the line-drawing itsself (Basic draws the lines almost at same speed).

So i guess i have to get rid off the BASIC and do the project in assembly...

PS.
* The procedure has one restriction: the x-value can only go from 0 to 255 (I use Hires-Mode). Where 128 is the center of the screen.
* If you set x1 = v and x2 =(v+1) it draws also points.

Either way: Its cool to have this procedure for the plus/4. Will be useful for other projects in the future.


Posted By

MMS
on 2019-12-02
16:34:35
 Re: Assembly: fast plot and line algorithms for plus/4

Actually CC65 is out and seems to support +4 too.
It compiles from C a rather fast code (at least told, I want to make some benchmark with compiled BASIC, but none of the compiled code containes the +4 specific GFX commands)

tgi.h contains most of the GFX routines, including line(tgi_line"") and fill ("tgi_bar")
https://www.cc65.org/doc/funcref-40.html

Too bad, that the floating point calculation (FLOAT type) is missing natively from the CC65 package.
It is a big defect here, as most of the 3D-->2D transformation happens in floating poitn calculations.

I found one library for CC65 from MRDUDZ that claims to realize it
https://github.com/mrdudz/cc65-floatlib

If the above does not work, what kind of workaround possible?
1) Does the Kernal routines are any use to cover this missing feature?
2) Any workaround experience with big numbers (eh, DOUBLE is also missing) and shifting the result?
3) Or just make with Intigers and then after divinding it accept that it in fact gives a value truncated and may become incorrect?

Posted By

George
on 2019-12-02
12:14:35
 Re: Assembly: fast plot and line algorithms for plus/4

@Mad: Thank you for your generous help. I supposed the demoscene made some progresses, that's why i kindly asked. I will try out [Kichy]s algorithm first an give you all feedback.

My goal is to bring the rendering to real-time with a maximum of 30 mins for the big models.

Good luck with your projects too.

@Kichy: Thank you for the hint. I had that clipping problem at the C64 or VIC20 port. I handled it from Basic, afik.

@bubis: The lines in theory can have every angle and length.



Posted By

KiCHY
on 2019-12-02
12:25:58
 Re: Assembly: fast plot and line algorithms for plus/4

@George: Be aware these line drawing functions won't handle clipping! If your original basic code utilized the built-in clipping of the DRAW command, you have to "manually" clip your lines.

@MAD: I think your code is pretty advanced for a ~beginnerhappy

Posted By

bubis
on 2019-12-02
11:56:46
 Re: Assembly: fast plot and line algorithms for plus/4

I have a line routine that uses speedcode for drawing 1,...,7,8,n wide/tall pixel blocks (n>8 is a separate case). So, when the slope is between 2 and 3 it draws 2 and 3 pixel blocks proportionally. It is not as accurate as Bresenham's but you can't really notice that, it is pretty fast and fits into 8K. happy
Just an idea...

Posted By

Mad
on 2019-12-02
11:33:32
 Re: Assembly: fast plot and line algorithms for plus/4

If you do have a lot of memory then Grahams of Oxyron approach on painting lines probably is the fastest. But however this approach involves very much code. I do have a code generator for that somewhere. But I think a normal Bresenham like Kichy shown up there is a already pretty fast at least a lot faster than basic line drawing. Line drawing is a topic where the demoscene got pretty far already. I always feared that topic in the past.. Maybe I can dig out that code generator and give that to you later on, if Kichys algorithm isn't enough for you..

Actuall graham does this per 8x8 block as far as I remember (needs some tweaking, just as an outline of that idea):

.again
sbc ZP_SLOPE
bcc .movepixelleftandcontinue1
sbc ZP_SLOPE
bcc .movepixelleftandcontinue2
sbc ZP_SLOPE
bcc .movepixelleftandcontinue3
sbc ZP_SLOPE
bcc .movepixelleftandcontinue4
sbc ZP_SLOPE
bcc .movepixelleftandcontinue5
sbc ZP_SLOPE
bcc .movepixelleftandcontinue6
sbc ZP_SLOPE
bcc .movepixelleftandcontinue7
sbc ZP_SLOPE
bcc .movepixelleftandcontinue8
do just drawAll8YPixelsOnSameXPosition here
jmp .again

.movepixelleftandcontinue1
dec pixelXPos
drawPixel
adc ZP_SLOPEORIGIN ; this is the bigger delta (x or y) (like in bresenham)
sbc ZP_SLOPE ; this is the smaller delta (x or y) (like in bresenham)
bcc .movepixelleftandcontinue2b
sbc ZP_SLOPE
bcc .movepixelleftandcontinue3b
sbc ZP_SLOPE
bcc .movepixelleftandcontinue4b
sbc ZP_SLOPE
bcc .movepixelleftandcontinue5b
sbc ZP_SLOPE
bcc .movepixelleftandcontinue6b
sbc ZP_SLOPE
bcc .movepixelleftandcontinue7b
sbc ZP_SLOPE
bcc .movepixelleftandcontinue8b
do just drawAll7LeftYPixelsOnSameXPosition here
jmp .again

This code later gets a lot longer because of all the .movepixelleftandcontinue jumps happy..

I hope Kichys codelines up there already do the performance trick..

Good luck with your project!!!

Posted By

George
on 2019-12-02
09:09:51
 Re: Assembly: fast plot and line algorithms for plus/4

As you may have guessed, i want to optimize (maybe rewrite) my 3D-engine in assembler.
My first step is to call the routine from BASIC and see the differences.
(Sidenote: I did already an optimization using Shellsort instead of Bubblesort with tremendous speedup for the next release)

@Csabo: I am at the stage, where i want to get something to work in assembler. I guess using charset will be faster and the code will be smaller, but do the same job in my engine. Or are there any severe differences for my case? I already learned much from your work and your tools.

@KiCHY Thank you. I spend many hours on this topic. Every beginning is frustrating. I will try it out this evening.

Posted By

KiCHY
on 2019-12-02
08:36:20
 Re: Assembly: fast plot and line algorithms for plus/4

;Start this routine at $1100

;zp
plot_lo = $fe
plot_hi = $ff

;coords
x_1 = 155
x_2 = 0
y_1 = 0
y_2 = 100

* = $1100

;init screen

lda $ff06
ora #$20 ; enable bitmap mode
sta $ff06
lda #$d8 ; bitmap at $6000
sta $ff12
lda #$40 ; color/luma screen at $4000
sta $ff14

; Set up colors

ldx #0
loop1 lda #$70
sta $4000,x
sta $4100,x
sta $4200,x
sta $4300,x
lda #$01
sta $4400,x
sta $4500,x
sta $4600,x
sta $4700,x
inx
bne loop1

; Clear bitmap area. X is 0 here.

lda #$00
loop2 sta $6000,x
sta $6100,x
sta $6200,x
sta $6300,x
sta $6400,x
sta $6500,x
sta $6600,x
sta $6700,x
sta $6800,x
sta $6900,x
sta $6a00,x
sta $6b00,x
sta $6c00,x
sta $6d00,x
sta $6e00,x
sta $6f00,x
sta $7000,x
sta $7100,x
sta $7200,x
sta $7300,x
sta $7400,x
sta $7500,x
sta $7600,x
sta $7700,x
sta $7800,x
sta $7900,x
sta $7a00,x
sta $7b00,x
sta $7c00,x
sta $7d00,x
sta $7e00,x
sta $7f00,x
inx
bne loop2

; Draw a single line in infinite loop.

sei
loop3 lda #$cb
cmp $ff1d
bne *-3
dec $ff19
jsr draw_line
inc $ff19
jmp loop3

; ----------------------------------------------------------------------

draw_line

;init

ldx #$e8 ;inx
lda #y_2
sta to_y+1
sec
sbc #y_1
bcs skip1
eor #$ff
adc #1
ldx #$ca ;dex - change direction
skip1
sta d_y+1
sta t_y_1+1
sta t_y_2+1
stx incx1
stx incx2

ldx #$c8 ;iny
lda #x_2
sta to_x+1
sec
sbc #x_1
bcs skip2
eor #$ff
adc #1
ldx #$88 ;dey - change direction
skip2
stx incy1
stx incy2

ldy #x_1
ldx #y_1

;loop

;start y in x-register
;start x in y-register
;delta x in a-register

d_y cmp #0
bcc steep

sta t_x_1+1
lsr
sta errx+1
loopx
clc ;needed, as previous cmp could set carry. could be saved if we always count up and branch with bcc;
lda x_char,y
adc y_char_lo,x
sta plot_lo
lda y_char_hi,x
sta plot_hi

lda x_pixel_char,y
ora (plot_lo),y
sta (plot_lo),y ;Remember that the y_char_lo table in this example starts at $20 (which center hires mode plotting). If you lower the start of table to below $08 (say for multicolor purposes where x steps are in doubles), you will get high-byte issues when you $FE in the adc x_char with the sta (),y

errx lda #$00
sec
t_y_1 sbc #0
bcs skip3

;one might also swap cases (bcc here) and duplicate the loopend. saves more or less cycles as the subtract-case occurs more often than the add-case. Copying the whole loop to zeropage also save cycles as sta errx+1 is only 3 cycles then. (Bitbreaker)

t_x_1 adc #0
incx1 inx
skip3 sta errx+1

incy1 iny
to_x cpy #0
bne loopx
rts

steep
sta t_x_2+1
lsr
sta erry+1
loopy
clc ;needed, as previous cmp could set carry. could be saved if we always count up and branch with bcc;
lda x_char,y
adc y_char_lo,x
sta plot_lo
lda y_char_hi,x
sta plot_hi

lda x_pixel_char,y
ora (plot_lo),y
sta (plot_lo),y

erry lda #$00
sec
t_x_2 sbc #0
bcs skip4

t_y_2 adc #0
incy2 iny
skip4 sta erry+1

incx2 inx
to_y cpx #0
bne loopy
rts

y_char_lo
.byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67
.byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7
.byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67
.byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7
.byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67
.byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7
.byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67
.byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7
.byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67
.byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7
.byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67
.byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7
.byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67
.byte $60,$61,$62,$63,$64,$65,$66,$67,$60,$61,$62,$63,$64,$65,$66,$67
.byte $60,$61,$62,$63,$64,$65,$66,$67,$60,$61,$62,$63,$64,$65,$66,$67
.byte $60,$61,$62,$63,$64,$65,$66,$67,$60,$61,$62,$63,$64,$65,$66,$67

y_char_hi
.byte $60,$60,$60,$60,$60,$60,$60,$60,$61,$61,$61,$61,$61,$61,$61,$61
.byte $62,$62,$62,$62,$62,$62,$62,$62,$63,$63,$63,$63,$63,$63,$63,$63
.byte $65,$65,$65,$65,$65,$65,$65,$65,$66,$66,$66,$66,$66,$66,$66,$66
.byte $67,$67,$67,$67,$67,$67,$67,$67,$68,$68,$68,$68,$68,$68,$68,$68
.byte $6a,$6a,$6a,$6a,$6a,$6a,$6a,$6a,$6b,$6b,$6b,$6b,$6b,$6b,$6b,$6b
.byte $6c,$6c,$6c,$6c,$6c,$6c,$6c,$6c,$6d,$6d,$6d,$6d,$6d,$6d,$6d,$6d
.byte $6f,$6f,$6f,$6f,$6f,$6f,$6f,$6f,$70,$70,$70,$70,$70,$70,$70,$70
.byte $71,$71,$71,$71,$71,$71,$71,$71,$72,$72,$72,$72,$72,$72,$72,$72
.byte $74,$74,$74,$74,$74,$74,$74,$74,$75,$75,$75,$75,$75,$75,$75,$75
.byte $76,$76,$76,$76,$76,$76,$76,$76,$77,$77,$77,$77,$77,$77,$77,$77
.byte $79,$79,$79,$79,$79,$79,$79,$79,$7a,$7a,$7a,$7a,$7a,$7a,$7a,$7a
.byte $7b,$7b,$7b,$7b,$7b,$7b,$7b,$7b,$7c,$7c,$7c,$7c,$7c,$7c,$7c,$7c
.byte $7e,$7e,$7e,$7e,$7e,$7e,$7e,$7e,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f
.byte $7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f
.byte $7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f
.byte $7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f
x_char
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9
.byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9

x_pixel_char
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
.byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01

Posted By

Csabo
on 2019-12-02
08:15:40
 Re: Assembly: fast plot and line algorithms for plus/4

Yeah, align 256 just aligns to the next 256 byte boundary. This is a nice way of making the code memory location independent, but it's easy to get around that as KiCHY explained. (If your code ends at let's say $1234, align 256 would continue from $1300, so you can do that manually instead.)

You didn't explicitly specify where you want to draw the plots and lines, but I'm going to assume on a graphics screen.

However, if you want to draw them in a charset (which would be perfect for demos), there's source code available. All my demos have their source released (with the hopes that someone will learn from them or use them), and two of them have fast line drawing code implemented. Check LOD Is Back (d_vector.asm) or Crackers' Demo 5 (line.asm). I use AS65 as the assembler.

This is more tricky; but you could try to disassemble maybe Botticelli or Vector Victory, both have line drawing routines.

Posted By

George
on 2019-12-02
07:48:26
 Re: Assembly: fast plot and line algorithms for plus/4

I tried it yesterday, the procedere hangs ...but give i a try again with your hints.

- "*" works and .byte = !byte in ACME
- struggled with the align because in acme it expects 2 params (i don't fully understand the meaning).
- replaced cmp $d012 with the TED equivalent (waiting for rasterline)
- I commented out the bordercolor settings (inc $d020)

Any further tips are welcome.

Posted By

KiCHY
on 2019-12-02
07:28:27
 Re: Assembly: fast plot and line algorithms for plus/4

You mentioned you already checked codebase64, but I suggest giving this algorithm another try:
https://codebase64.org/doku.php?id=base:bresenham_s_line_algorithm_2
I don't use ACME but I think it can handle the "*" character, have some kind of ".byte" command too. Perhaps that ".align 256" needs some clarification, but it's easy to solve: the routine starts at $1000. If it is less than 256 bytes, you can change the ".align 256" to "* = $1100".

Posted By

George
on 2019-12-02
06:10:52
 Assembly: fast plot and line algorithms for plus/4

i am doing my researches in graphics with assembly for plotting points and lines.
I have understood how to initialize the screen and draw some points in color so far. I also made some small working programs.

On codebase64 (and other rescources) there are several algorithms for drawing points and lines.
I could not get them running with acme so far.
Does anybody has some working code for the plus/4 or knows where i can find some source.
I would apreciate it.

Which algorithms do you recommend (compromise between small and fast)?


Copyright © Plus/4 World Team, 2001-2024. Support Plus/4 World on Patreon