Previous Messages | Posted By
Mad on 2019-12-27 00:06:24
| Re: Assembly: fast plot and line algorithms for plus/4
Awesome! I did see this document somewhen ago.. Was pretty "flashed" by seeing code done by him! :D
|
|
Posted By
MMS on 2019-12-24 13:12:16
| Re: Assembly: fast plot and line algorithms for plus/4
And here are the floating point algorithms (could be used for a C library) from noone else then the legendary Steve Wozniak. (Woz)
http://6502.org/source/floats/wozfp1.txt
PS (did I mention, that for some time I worked together with a colleague at Philips Austria, who was Woz' realtive? Certainly called Wozniak too)
|
|
Posted By
raze on 2019-12-10 15:48:46
| Re: Assembly: fast plot and line algorithms for plus/4
Hello, I wrote a small game in C that uses a fast routine to draw pixels (the fastest I can think of), I extracted the small code below, maybe you can find it useful. I didn't test it, but should be ok. ---------------------------
// Pixel read-write fast routine // By Giuseppe Mignogna 2019 // Not tested, should compile under cc65: // cl65 -O -t plus4 demo.c -o demo.prg
// TODO: The instructions to set up the // graphic mode must be added
#define BITMAP_START 0x2000 #define UCHAR unsigned char #define UINT unsigned int
UINT g_xb[320]; UINT g_yb[200]; UCHAR g_bt[8];
void init_draw_pixel() { UINT i; for(i=0; i<200; i++) g_yb = BITMAP_START + i/8*320; // i/8 -> int for(i=0; i<320; i++) g_xb = i/8 * 8; for(i=0; i<8; i++) g_bt = 1 << (7 - i); }
void draw_pixel(UINT x, UCHAR y) { UINT b = 0; if(x > 319 || y > 199) return; b = g_yb[y] + g_xb[x] + (UCHAR)(y&7); POKE(b, PEEK( b ) ^ g_bt[x&7]); }
UCHAR read_pixel(UINT x, UCHAR y) { UINT b = 0; // if(x > 319 || y > 199) return; b = g_yb[y] + g_xb[x] + (UCHAR)(y&7); return(PEEK( b ) & g_bt[x&7]); }
int main (void) { UCHAR y=0; init_draw_pixel(); for(y=0; y < 200; y++) { draw_pixel(100, y); } }
|
|
Posted By
KiCHY on 2019-12-05 02:27:20
| Re: Assembly: fast plot and line algorithms for plus/4
...where worths mentioning I just converted the same routine I posted you earlier, from codebase64
|
|
Posted By
George on 2019-12-03 15:36:48
| Re: Assembly: fast plot and line algorithms for plus/4
I made an encyclopedia entry of Kichy's routine. Thanky you Kichy!
|
|
Posted By
KiCHY on 2019-12-02 17:27:31
| Re: Assembly: fast plot and line algorithms for plus/4
@MMS: There are quite a few floating point math libraries for the 6502. Check out 6502.org for source codes.
|
|
Posted By
George on 2019-12-02 17:20:27
| Re: Assembly: fast plot and line algorithms for plus/4
A short feedback: I suceeded to call the procedure from Basic after relocation the bitmap to $2000. There is no essenatial speedup of my Basic-application. I guess its because the Basic-Context (Integer Arrays, ...) and not the line-drawing itsself (Basic draws the lines almost at same speed).
So i guess i have to get rid off the BASIC and do the project in assembly...
PS. * The procedure has one restriction: the x-value can only go from 0 to 255 (I use Hires-Mode). Where 128 is the center of the screen. * If you set x1 = v and x2 =(v+1) it draws also points.
Either way: Its cool to have this procedure for the plus/4. Will be useful for other projects in the future.
|
|
Posted By
MMS on 2019-12-02 16:34:35
| Re: Assembly: fast plot and line algorithms for plus/4
Actually CC65 is out and seems to support +4 too. It compiles from C a rather fast code (at least told, I want to make some benchmark with compiled BASIC, but none of the compiled code containes the +4 specific GFX commands)
tgi.h contains most of the GFX routines, including line(tgi_line"") and fill ("tgi_bar") https://www.cc65.org/doc/funcref-40.html
Too bad, that the floating point calculation (FLOAT type) is missing natively from the CC65 package. It is a big defect here, as most of the 3D-->2D transformation happens in floating poitn calculations.
I found one library for CC65 from MRDUDZ that claims to realize it https://github.com/mrdudz/cc65-floatlib
If the above does not work, what kind of workaround possible? 1) Does the Kernal routines are any use to cover this missing feature? 2) Any workaround experience with big numbers (eh, DOUBLE is also missing) and shifting the result? 3) Or just make with Intigers and then after divinding it accept that it in fact gives a value truncated and may become incorrect?
|
|
Posted By
George on 2019-12-02 12:14:35
| Re: Assembly: fast plot and line algorithms for plus/4
@Mad: Thank you for your generous help. I supposed the demoscene made some progresses, that's why i kindly asked. I will try out [Kichy]s algorithm first an give you all feedback.
My goal is to bring the rendering to real-time with a maximum of 30 mins for the big models.
Good luck with your projects too.
@Kichy: Thank you for the hint. I had that clipping problem at the C64 or VIC20 port. I handled it from Basic, afik.
@bubis: The lines in theory can have every angle and length.
|
|
Posted By
KiCHY on 2019-12-02 12:25:58
| Re: Assembly: fast plot and line algorithms for plus/4
@George: Be aware these line drawing functions won't handle clipping! If your original basic code utilized the built-in clipping of the DRAW command, you have to "manually" clip your lines.
@MAD: I think your code is pretty advanced for a ~beginner
|
|
Posted By
bubis on 2019-12-02 11:56:46
| Re: Assembly: fast plot and line algorithms for plus/4
I have a line routine that uses speedcode for drawing 1,...,7,8,n wide/tall pixel blocks (n>8 is a separate case). So, when the slope is between 2 and 3 it draws 2 and 3 pixel blocks proportionally. It is not as accurate as Bresenham's but you can't really notice that, it is pretty fast and fits into 8K. Just an idea...
|
|
Posted By
Mad on 2019-12-02 11:33:32
| Re: Assembly: fast plot and line algorithms for plus/4
If you do have a lot of memory then Grahams of Oxyron approach on painting lines probably is the fastest. But however this approach involves very much code. I do have a code generator for that somewhere. But I think a normal Bresenham like Kichy shown up there is a already pretty fast at least a lot faster than basic line drawing. Line drawing is a topic where the demoscene got pretty far already. I always feared that topic in the past.. Maybe I can dig out that code generator and give that to you later on, if Kichys algorithm isn't enough for you..
Actuall graham does this per 8x8 block as far as I remember (needs some tweaking, just as an outline of that idea):
.again sbc ZP_SLOPE bcc .movepixelleftandcontinue1 sbc ZP_SLOPE bcc .movepixelleftandcontinue2 sbc ZP_SLOPE bcc .movepixelleftandcontinue3 sbc ZP_SLOPE bcc .movepixelleftandcontinue4 sbc ZP_SLOPE bcc .movepixelleftandcontinue5 sbc ZP_SLOPE bcc .movepixelleftandcontinue6 sbc ZP_SLOPE bcc .movepixelleftandcontinue7 sbc ZP_SLOPE bcc .movepixelleftandcontinue8 do just drawAll8YPixelsOnSameXPosition here jmp .again
.movepixelleftandcontinue1 dec pixelXPos drawPixel adc ZP_SLOPEORIGIN ; this is the bigger delta (x or y) (like in bresenham) sbc ZP_SLOPE ; this is the smaller delta (x or y) (like in bresenham) bcc .movepixelleftandcontinue2b sbc ZP_SLOPE bcc .movepixelleftandcontinue3b sbc ZP_SLOPE bcc .movepixelleftandcontinue4b sbc ZP_SLOPE bcc .movepixelleftandcontinue5b sbc ZP_SLOPE bcc .movepixelleftandcontinue6b sbc ZP_SLOPE bcc .movepixelleftandcontinue7b sbc ZP_SLOPE bcc .movepixelleftandcontinue8b do just drawAll7LeftYPixelsOnSameXPosition here jmp .again
This code later gets a lot longer because of all the .movepixelleftandcontinue jumps ..
I hope Kichys codelines up there already do the performance trick..
Good luck with your project!!!
|
|
Posted By
George on 2019-12-02 09:09:51
| Re: Assembly: fast plot and line algorithms for plus/4
As you may have guessed, i want to optimize (maybe rewrite) my 3D-engine in assembler. My first step is to call the routine from BASIC and see the differences. (Sidenote: I did already an optimization using Shellsort instead of Bubblesort with tremendous speedup for the next release)
@Csabo: I am at the stage, where i want to get something to work in assembler. I guess using charset will be faster and the code will be smaller, but do the same job in my engine. Or are there any severe differences for my case? I already learned much from your work and your tools.
@KiCHY Thank you. I spend many hours on this topic. Every beginning is frustrating. I will try it out this evening.
|
|
Posted By
KiCHY on 2019-12-02 08:36:20
| Re: Assembly: fast plot and line algorithms for plus/4
;Start this routine at $1100
;zp plot_lo = $fe plot_hi = $ff ;coords x_1 = 155 x_2 = 0 y_1 = 0 y_2 = 100
* = $1100
;init screen
lda $ff06 ora #$20 ; enable bitmap mode sta $ff06 lda #$d8 ; bitmap at $6000 sta $ff12 lda #$40 ; color/luma screen at $4000 sta $ff14
; Set up colors
ldx #0 loop1 lda #$70 sta $4000,x sta $4100,x sta $4200,x sta $4300,x lda #$01 sta $4400,x sta $4500,x sta $4600,x sta $4700,x inx bne loop1
; Clear bitmap area. X is 0 here.
lda #$00 loop2 sta $6000,x sta $6100,x sta $6200,x sta $6300,x sta $6400,x sta $6500,x sta $6600,x sta $6700,x sta $6800,x sta $6900,x sta $6a00,x sta $6b00,x sta $6c00,x sta $6d00,x sta $6e00,x sta $6f00,x sta $7000,x sta $7100,x sta $7200,x sta $7300,x sta $7400,x sta $7500,x sta $7600,x sta $7700,x sta $7800,x sta $7900,x sta $7a00,x sta $7b00,x sta $7c00,x sta $7d00,x sta $7e00,x sta $7f00,x inx bne loop2
; Draw a single line in infinite loop.
sei loop3 lda #$cb cmp $ff1d bne *-3 dec $ff19 jsr draw_line inc $ff19 jmp loop3
; ----------------------------------------------------------------------
draw_line ;init ldx #$e8 ;inx lda #y_2 sta to_y+1 sec sbc #y_1 bcs skip1 eor #$ff adc #1 ldx #$ca ;dex - change direction skip1 sta d_y+1 sta t_y_1+1 sta t_y_2+1 stx incx1 stx incx2 ldx #$c8 ;iny lda #x_2 sta to_x+1 sec sbc #x_1 bcs skip2 eor #$ff adc #1 ldx #$88 ;dey - change direction skip2 stx incy1 stx incy2 ldy #x_1 ldx #y_1 ;loop ;start y in x-register ;start x in y-register ;delta x in a-register d_y cmp #0 bcc steep sta t_x_1+1 lsr sta errx+1 loopx clc ;needed, as previous cmp could set carry. could be saved if we always count up and branch with bcc; lda x_char,y adc y_char_lo,x sta plot_lo lda y_char_hi,x sta plot_hi lda x_pixel_char,y ora (plot_lo),y sta (plot_lo),y ;Remember that the y_char_lo table in this example starts at $20 (which center hires mode plotting). If you lower the start of table to below $08 (say for multicolor purposes where x steps are in doubles), you will get high-byte issues when you $FE in the adc x_char with the sta (),y errx lda #$00 sec t_y_1 sbc #0 bcs skip3 ;one might also swap cases (bcc here) and duplicate the loopend. saves more or less cycles as the subtract-case occurs more often than the add-case. Copying the whole loop to zeropage also save cycles as sta errx+1 is only 3 cycles then. (Bitbreaker) t_x_1 adc #0 incx1 inx skip3 sta errx+1 incy1 iny to_x cpy #0 bne loopx rts steep sta t_x_2+1 lsr sta erry+1 loopy clc ;needed, as previous cmp could set carry. could be saved if we always count up and branch with bcc; lda x_char,y adc y_char_lo,x sta plot_lo lda y_char_hi,x sta plot_hi lda x_pixel_char,y ora (plot_lo),y sta (plot_lo),y erry lda #$00 sec t_x_2 sbc #0 bcs skip4 t_y_2 adc #0 incy2 iny skip4 sta erry+1 incx2 inx to_y cpx #0 bne loopy rts y_char_lo .byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67 .byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7 .byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67 .byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7 .byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67 .byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7 .byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67 .byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7 .byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67 .byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7 .byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67 .byte $a0,$a1,$a2,$a3,$a4,$a5,$a6,$a7,$e0,$e1,$e2,$e3,$e4,$e5,$e6,$e7 .byte $20,$21,$22,$23,$24,$25,$26,$27,$60,$61,$62,$63,$64,$65,$66,$67 .byte $60,$61,$62,$63,$64,$65,$66,$67,$60,$61,$62,$63,$64,$65,$66,$67 .byte $60,$61,$62,$63,$64,$65,$66,$67,$60,$61,$62,$63,$64,$65,$66,$67 .byte $60,$61,$62,$63,$64,$65,$66,$67,$60,$61,$62,$63,$64,$65,$66,$67 y_char_hi .byte $60,$60,$60,$60,$60,$60,$60,$60,$61,$61,$61,$61,$61,$61,$61,$61 .byte $62,$62,$62,$62,$62,$62,$62,$62,$63,$63,$63,$63,$63,$63,$63,$63 .byte $65,$65,$65,$65,$65,$65,$65,$65,$66,$66,$66,$66,$66,$66,$66,$66 .byte $67,$67,$67,$67,$67,$67,$67,$67,$68,$68,$68,$68,$68,$68,$68,$68 .byte $6a,$6a,$6a,$6a,$6a,$6a,$6a,$6a,$6b,$6b,$6b,$6b,$6b,$6b,$6b,$6b .byte $6c,$6c,$6c,$6c,$6c,$6c,$6c,$6c,$6d,$6d,$6d,$6d,$6d,$6d,$6d,$6d .byte $6f,$6f,$6f,$6f,$6f,$6f,$6f,$6f,$70,$70,$70,$70,$70,$70,$70,$70 .byte $71,$71,$71,$71,$71,$71,$71,$71,$72,$72,$72,$72,$72,$72,$72,$72 .byte $74,$74,$74,$74,$74,$74,$74,$74,$75,$75,$75,$75,$75,$75,$75,$75 .byte $76,$76,$76,$76,$76,$76,$76,$76,$77,$77,$77,$77,$77,$77,$77,$77 .byte $79,$79,$79,$79,$79,$79,$79,$79,$7a,$7a,$7a,$7a,$7a,$7a,$7a,$7a .byte $7b,$7b,$7b,$7b,$7b,$7b,$7b,$7b,$7c,$7c,$7c,$7c,$7c,$7c,$7c,$7c .byte $7e,$7e,$7e,$7e,$7e,$7e,$7e,$7e,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f .byte $7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f .byte $7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f .byte $7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f,$7f x_char .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 .byte $00,$ff,$fe,$fd,$fc,$fb,$fa,$f9,$00,$ff,$fe,$fd,$fc,$fb,$fa,$f9 x_pixel_char .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01 .byte $80,$40,$20,$10,$08,$04,$02,$01,$80,$40,$20,$10,$08,$04,$02,$01
|
|
Posted By
Csabo on 2019-12-02 08:15:40
| Re: Assembly: fast plot and line algorithms for plus/4
Yeah, align 256 just aligns to the next 256 byte boundary. This is a nice way of making the code memory location independent, but it's easy to get around that as KiCHY explained. (If your code ends at let's say $1234, align 256 would continue from $1300, so you can do that manually instead.)
You didn't explicitly specify where you want to draw the plots and lines, but I'm going to assume on a graphics screen.
However, if you want to draw them in a charset (which would be perfect for demos), there's source code available. All my demos have their source released (with the hopes that someone will learn from them or use them), and two of them have fast line drawing code implemented. Check LOD Is Back (d_vector.asm) or Crackers' Demo 5 (line.asm). I use AS65 as the assembler.
This is more tricky; but you could try to disassemble maybe Botticelli or Vector Victory, both have line drawing routines.
|
|
Posted By
George on 2019-12-02 07:48:26
| Re: Assembly: fast plot and line algorithms for plus/4
I tried it yesterday, the procedere hangs ...but give i a try again with your hints.
- "*" works and .byte = !byte in ACME - struggled with the align because in acme it expects 2 params (i don't fully understand the meaning). - replaced cmp $d012 with the TED equivalent (waiting for rasterline) - I commented out the bordercolor settings (inc $d020)
Any further tips are welcome.
|
|
Posted By
KiCHY on 2019-12-02 07:28:27
| Re: Assembly: fast plot and line algorithms for plus/4
You mentioned you already checked codebase64, but I suggest giving this algorithm another try: https://codebase64.org/doku.php?id=base:bresenham_s_line_algorithm_2 I don't use ACME but I think it can handle the "*" character, have some kind of ".byte" command too. Perhaps that ".align 256" needs some clarification, but it's easy to solve: the routine starts at $1000. If it is less than 256 bytes, you can change the ".align 256" to "* = $1100".
|
|
Posted By
George on 2019-12-02 06:10:52
| Assembly: fast plot and line algorithms for plus/4
i am doing my researches in graphics with assembly for plotting points and lines. I have understood how to initialize the screen and draw some points in color so far. I also made some small working programs.
On codebase64 (and other rescources) there are several algorithms for drawing points and lines. I could not get them running with acme so far. Does anybody has some working code for the plus/4 or knows where i can find some source. I would apreciate it.
Which algorithms do you recommend (compromise between small and fast)?
|
|
|