Posted By
Giuseppe on 2010-11-08 19:45:41
| Assembly and interrupts: my first attempt. Help needed!
OK, after several readings, testings and headaches, I think I can post a decent result of my studies. To be honest, I post it in the hope that someone will check it and will give me some good suggestions!
The program is a very simple and raw prototype of the main routine I'll need to make a videogame (or maybe a demo). It shows some sprite-characters moving from right to left. The program is structured like this:
1. Sprites are written into a temporary screen buffer; 2. Sprite positions and delay values, which are stored into a table, are updated; 3. When all the sprites have been processed, a flag is set; 4. When the flag is set, a routine that is executed at each vertical blank, copies all the data from the buffer to the screen memory; 5. The flag is reset by the vblank routine, and the process is re-executed from point 1.
I've put many comments in the source hoping that it will be clear enough. Since I didn't find many examples showing how to use interrupts with C16 (actually, I found only 3 or 4), I don't know if the decisions I took to achieve some results are correct or not. For example, I don't know if it is a good idea to use a buffer to create the video screen, but I found that I can't create it during the vblank, because I have too few time! How much time? I don't know, but it looks like I avoided the flickering, because during the vblank I have only to copy the buffer into screen memory, without calling the time-consuming routines.
Another decision I took, is to use a table to calculate the screen address from the X, Y coordinates. I would like to know if there is some faster technique to do it. That one seemed to me quite fast, but I'm sure there is something faster.
You can download an archive containing the .asm and the .prg files from this address:
http://digilander.libero.it/raze2002/raster09.zip
In the directory Compiler I've put the AS65 compiler. If you want to compile the RASTER.ASM source, simply execute makeprg.bat (I checked as65.exe with virustotal.com and it is clean .
Please help me! I really need suggestions, examples or tutorials to make it better or to fix what I did wrong! Thank you!
|
|
Posted By
Rybags on 2010-11-09 08:41:17
| Re: Assembly and interrupts: my first attempt. Help needed!
Buffering then moving sprite data isn't really the done thing on our slow old computers.
You can fare better by doing one or more of the following:
1. Trigger your "VBlank" IRQ via a Raster Int request for a scanline near the bottom of the display window portion of the screen. That means you'll get maximum non-display time to do screen changes. If it's the case that you do stuff like read joysticks, move object x/y positions around etc you might find that task takes a good number of scanlines. As such, you could move your IRQ upwards such that these preliminary tasks take place during the display area. "Raster bars" as a debug aid can help here... e.g. have the border black, then change it temporarily to other colours during each stage of your per-frame processing.
2. "Double-buffering" - if you're moving lots of stuff around it can be more beneficial sometimes to just have 2 screens. Do all the erases and updates on the "non-active" screen copy while the "active" screen is being displayed, then switch to the non-active screen once a VBlank has occurred and that screen is ready.
3. Pre-Shifted sprites. Bigger memory cost, but much quicker to put to screen than having an object that you have to run through shift/rotate instructions.
4. The Brute-Force approach. Can cost a lot of memory, but the fastest way to do software sprites. Rather than reading from definitions by instructions like LDA (Sprite_ptr),Y - use immediate mode instructions to read the sprite data. Not necessarily valid for all your moving objects, but for little things like bullets and pickup objects could save you sufficient time to do plenty of other processing.
Additional to that, there's a method we call the "Stripe Method" where you just have a huge block of instructions.
Imagine the code block: lda screen,x and mask,y ora sprite,y sta screen,x iny
Repeat that code block such that every single possible vertical screen offset is catered for. So you'd have screen+1 through +7, then 320 to 327, 640 to 647 etc.
So, you end up with this big block of code - it must reside in RAM because you need to self-modify it. The entry point to the block of code is dependant on what the Sprite's Y position is. So, you calculate the offset needed, then self-modify a JSR instruction that calls the routine. You preload the Y register with the offset to your sprite "stripe" segment. Also before calling, you need to insert an RTS instruction, so calculate the offset for that one too. Save that calculated address so you can restore the "iny" instruction later. The limitation is that the block of code can only cater for 4x256 pixels worth of sprite data, any more and you need more code blocks pointing to different sprite/mask areas. Also the sprite data has to be pre-shifted, so effectively we lose out quite a bit there too. Set the X register to the "stripe number", ie for a 12 pixel wide sprite, you'd call the routine in succession with X=0 then 1 then 2. If the sprite isn't right-justified to a 4-pixel boundary, then you'd call it again with X=3.
|
|
Posted By
Giuseppe on 2010-11-09 16:43:23
| Re: Assembly and interrupts: my first attempt. Help needed!
Thank you Rybags for your time and reply! I found it very useful, so I'd like to comment your points.
> 1. Trigger your "VBlank" IRQ via a Raster Int request for a scanline near the bottom of the display window portion of the screen. That means you'll get maximum non-display time to do screen changes.
Yes, it is a good idea, but I threw it away because I thought I still will have short time for calculations. I don't mind if I have to spend two or three frames for them, I just want to avoid flickering. Anyway, I tried to calculate how much time there is between the last visible scanline (that I think is scanline nr. 250, more or less) and the first visible scanline (nr. ~50). Approximately it should be 1/150 seconds: is that correct?
> 2. "Double-buffering" - if you're moving lots of stuff around it can be more beneficial sometimes to just have 2 screens.
Do you mean using TED register $FF14? Yes, definitely that is a very good idea! I thought about it also, but I didn't use it because I had the feeling that to keep things fast, I had to write each routine two times, one for the first buffer, one for the second. But I'll give it a try
> 3. Pre-Shifted sprites. Bigger memory cost, but much quicker to put to screen than having an object that you have to run through shift/rotate instructions.
I think you are talking about pixel-by-pixel sprite movements, right? I still have not reached that step (it will be the next one), but surely I'll keep in mind your suggestion.
> 4. The Brute-Force approach. Can cost a lot of memory, but the fastest way to do software sprites. Rather than reading from definitions by instructions like LDA (Sprite_ptr),Y - use immediate mode instructions to read the sprite data.
Hehe, OK, I guess I got it. I don't like it very much, but it can be indeed fast!
> Additional to that, there's a method we call the "Stripe Method" where you just have a huge block of instructions. [...]
I must admit that I understood very little of this point, but maybe I'll understand it better in the future. Anyway, you gave me a very naughty suggestion: the code that self-modify a JSR! Haha, it is tremendous, you made my (few) hairs stand on end, but I understand that with such low power available, this method could save the day! Yeah, thank you!!!
|
|
Posted By
Giuseppe on 2010-11-09 16:51:12
| Re: Assembly and interrupts: my first attempt. Help needed!
I post here the source code of my small program, so you guys can have a look at it without downloading the zip above:
; ** File: RASTER.ASM ; ** Version: v0.9 - 08/11/2010 ; ** Author: Giuseppe Mignogna ; ; Code written in Assembly 6502 for the AS65 compiler. ; Load into Plus/4 emulator and run with "SYS 8192". ; This code is for studying purposes and it's not optimized at all!
; -- Macro macroSR ------------------------------ macroSR macro ; Save registers to stack PHA TXA PHA TYA PHA endm ; -----------------------------------------------
; -- Macro macroRR ------------------------------ macroRR macro ; Restore registers from stack PLA TAY PLA TAX PLA endm ; -----------------------------------------------
; -- Initializing ------------------------------- ; Locate code at $2000 org $2000
; Set the first sprite to be processed LDA #$00 STA spriteIndex ; Set the buffer as not ready STA isBufferReady
; Modify the VBLANK vector to point at a custom routine SEI LDA #VBLANKIRQ & 255 STA $0312 LDA #VBLANKIRQ >> 8 STA $0313 CLI ; -----------------------------------------------
; -- Main ---------------------------------------- MAIN ; The MAIN routine is looped endlessley. It fills a screen buffer ; located at $3000 with characters (sprites) moving from right to ; left. Sprites data are defined into a table below. Some data is ; not used at the moment.
; Check if the buffer is ready to be copied into screen memory LDA isBufferReady BNE jumpOver02 ; Buffer is not ready, then clear the screen buffer LDA #$20 LDX #$00 erase01 STA $3000,X STA $3100,X STA $3200,X STA $3300,X DEX BNE erase01 drawSprites ; Calculate spriteindex*8 LDA spriteIndex ASL A ASL A ASL A TAY ; Get screen address from X,Y coords stored into sprites table JSR xy2addr ; Put sprite character into screen buffer ; Address is stored at $d2,$d3 LDX #$00 LDA sprites+4,y ; sprite character STA ($D2,X)
; Manage delay TYA TAX DEC sprites+6,x ; delay counter BNE jumpOver01 ; Restore delay value LDA sprites+5,x ; original delay value STA sprites+6,x ; Decrease X coord (sprite moves left) DEC sprites,x BNE jumpOver01 ; Restore X coord LDA #$27 ; column 39 STA sprites,x
jumpOver01 ; Set spriteindex to next sprite INC spriteIndex LDA spriteIndex CMP maxSprites BNE drawSprites
; Buffer has been filled with sprites and now ; it's ready to be written into screen memory ; by the VBLANK routine. Meanwhile, the MAIN ; program will do nothing, besides checking the ; isBufferReady variable to be zero again. LDA #$00 STA spriteIndex LDA #$01 STA isBufferReady jumpOver02 JMP MAIN ; ------------------------------------------------
; -- VBLANK routine ------------------------------ VBLANKIRQ ; Check if the buffer is ready to be copied into screen memory LDA isBufferReady BEQ jumpOver04 ; Copy the buffer into screen memory LDX #$00 copyBuffer LDA $3000,X STA $0c00,X LDA $3100,X STA $0d00,X LDA $3200,X STA $0e00,X LDA $3300,X STA $0f00,X DEX BNE copyBuffer
; Buffer now is not ready again and must be filled by the MAIN routine LDA #$00 STA isBufferReady jumpOver04 JMP $CE42 ; ------------------------------------------------
; -- Subroutine xy2addr -------------------------- ; Input: register Y = nr. sprite*8 ; (each sprite has 8 bytes of data in the table)
xy2addr macroSR ; save registers
; Calculate the memory address corresponding to the ; X and Y coordinates using a table that stores all the ; 25 addresses of the column zero LDA sprites+1,y ; Y coord from sprite table ASL A ; Y = Y * 2 TAX LDA columnZero,x ; low byte address STA $D2 INX LDA columnZero,x ; high byte address STA $D3 ; Add X coord to address LDA sprites,y ; X coord from sprite table ADC $D2 ; add to low byte STA $D2 BCC jumpOver03 ; if carry is clear, do nothing INC $D3 ; on carry set, increase high byte (next page)
jumpOver03 macroRR ; restore registers RTS ; ------------------------------------------------
; -- Data --------------------------------------- columnZero DW $3000, $3028, $3050, $3078, $30A0, $30C8, $30F0, $3118 DW $3140, $3168, $3190, $31B8, $31E0, $3208, $3230, $3258 DW $3280, $32A8, $32D0, $32F8, $3320, $3348, $3370, $3398 DW $33C0
maxSprites DB $10
sprites ; x y ofs col chr dly dlv sts DB $27,$02,$00,$00,$D1,$14,$14,$00 DB $27,$03,$00,$00,$D0,$03,$03,$00 DB $27,$04,$00,$00,$D0,$04,$04,$00 DB $27,$05,$00,$00,$D1,$07,$07,$00 DB $27,$06,$00,$00,$D0,$01,$01,$00 DB $27,$07,$00,$00,$D0,$02,$02,$00 DB $27,$08,$00,$00,$D1,$06,$06,$00 DB $27,$09,$00,$00,$D1,$01,$01,$00 DB $27,$0A,$00,$00,$D0,$14,$14,$00 DB $27,$0B,$00,$00,$D0,$03,$03,$00 DB $27,$0C,$00,$00,$D1,$04,$04,$00 DB $27,$0D,$00,$00,$D0,$07,$07,$00 DB $27,$0E,$00,$00,$D1,$01,$01,$00 DB $27,$0F,$00,$00,$D1,$02,$02,$00 DB $27,$10,$00,$00,$D0,$06,$06,$00 DB $27,$11,$00,$00,$D1,$01,$01,$00
spriteIndex DB $00
isBufferReady DB $00 ; -----------------------------------------------
|
|
Posted By
Rybags on 2010-11-09 20:17:02
| Re: Assembly and interrupts: my first attempt. Help needed!
I maybe rushed in that idea of "strip sprites".
It's actually a method that was discovered in the Atari game "Zone Ranger".
The method isn't fully useful on C64/Plus4 bitmap modes because of the way the graphics data is arranged. In a linear scheme like Atari uses, you can address an entire "Y" position with "STA xxxx,X" because the last byte in the line is only 39 bytes ahead of the first.
But on the C= machines mentioned, the last byte of a scanline is actually 311 bytes away from the first, so you can't reach it with an Indexed addressing instruction
But not all is lost - you could still use the method, but would need 2 big blocks of code instead of 1 so that you could reach the far side of the screen.
Re time per frame etc.
The normal display area takes 200 scanlines. A normal PAL frame should be 312 scanlines, or 262 if an NTSC system.
So, that gives 112 or 62 non-display scanlines depending on what system you're on.
But, the bonus of the non-display scanlines is that the CPU will be running full speed, no slowdown due to TED needing to do graphics data fetches.
So, e.g. you might have a bit of code that takes 20 scanlines to execute if run during the display area, but it might only take 15 or so scanlines if running offscreen.
|
|
Posted By
Csabo on 2010-11-10 10:01:09
| Re: Assembly and interrupts: my first attempt. Help needed!
A few suggestions (unrelated to sprites):
- Change "ORG $2000" to "ORG $2002-2 / DW $2000". This will add the PRG file header, so your program is ready to be executed after compilation. No need for extra commands to copy the header in front of it.
- Name your file "RASTER_g2000.asm" instead. That way YAPE will autostart your program, no need to type the extra SYS command.
Perhaps you can consider using Plus4IDE. It lets you edit your ASM file and run it in YAPE with a single keypress.
Now, as for your program, I ran it, and it works fine. I'm a big fan of "if it's not broken, don't fix it". I read your initial post and I'm not sure what kind of answer you are looking for: this works, so you've solved your own problem (your own way). We don't know exactly what you envisioned the end product to be, so why don't you just continue? Once you run into something that doesn't work, then you can ask for very specific help.
|
|
Posted By
Giuseppe on 2010-11-10 16:28:41
| Re: Assembly and interrupts: my first attempt. Help needed!
Hi again! I don't want to annoy you, so I try to keep my replys short.
@Rybags: I didn't know CPU goes faster when raster is in the "dead" zones, it is interesting! Thank you for the hint!
@Csabo: First of all, thank you for your help. It is exactly the kind of suggestions I was looking for I used for a while Plus4IDE (as you already suggested me in a previous message), but I had some problems. So, I moved to Notepad++, that's an editor with which I feel very comfortable. I can run the compiler using the NPP Exec plugin. Besides that, the included sources in Plus4IDE were very useful to me!
About your "if it's not broken, don't fix it", I agree with you. But, since I'm not an expert 6502 programmer, I was wondering if there was a better way to achieve my same result. Rybags gave me some good suggestions.
At this moment, I have a very speciic question, that is:
"what is the fastest way to obtain the text screen address from X,Y coordinates (with x=[0,39] and Y=[0,24])?"
|
|
Posted By
Rybags on 2010-11-11 01:48:10
| Re: Assembly and interrupts: my first attempt. Help needed!
Calculating a text screen address given X/Y - a good quick way is to just have a table with all the X=0, Y=n addresses, that only costs you 50 bytes.
Then you can just grab the address from the table, put it into your z-page pointer or self-modify your code, and use X or Y as an index using the "X position".
CPU speed - OK I couldn't find the proper info, so this is from memory and probably isn't 100% right.
A full PAL frame is 312 scanlines (262 if NTSC). Regardless of system you have 114 cycles per scanline.
On any scanline except a "badline", 5 cycles are lost for RAM Refresh (DRAM has to constatnly be refreshed so it doesn't lose it's contents).
In the onscreen area you have the additional "lost cycles": - in any line including "badlines", you lose 40 cycles for the character bitmap data fetches. - in a "badline" where TED has to fetch the attribute or character code bytes, you lose another 40 cycles. There are 50 badlines in a normal display, they occur 1 scanline before and on the scanline where a row of characters begins.
* This is the part I'm not 100% on - since so many cycles are lost already during a badline, only 1 Refresh cycle occurs instead of the usual 5. Also, the CPU is throttled at 50% during the display window portion, but the net effect is as described above.
So, from a programming perspective, even though the offscreen area is barely half that of the onscreen area - 112 vs 200 in PAL, the amount of cycles available to the CPU is roughly equal for each. Different story for NTSC though, only 62 scanlines vs 200 - the significantly smaller amount of CPU cycles per frame is why many modern demos for 8-bit computers work on PAL but not NTSC.
|
|
Posted By
TLC on 2010-11-11 05:20:29
| Re: Assembly and interrupts: my first attempt. Help needed!
Rybags:
Scanline cycles:
a.) One scanline is 114 "double" clock cycles long. b.) In practice, on most border scanlines, you have 114-5 = 109 cycles for the CPU (because of the TED dram refresh cycles). On a few border scanlines (before and after the first lines of the character screen) c.) applies. (So far so good). c.) On any other scanlines minus badlines, you have 114-5-40-4 = 65 cpu cycles. 5: dram refresh, 40: TED bitmap fetch cycles, 4: additional cycles reserved by TED. d.) On a badline, you have again 40 +0..3 cycles less, ie. 22 to 25 cycles in overall (depending on the memory operation performed by the CPU on the first cycle of the TED dma request).
|
|
Posted By
Csabo on 2010-11-11 10:59:59
| Re: Assembly and interrupts: my first attempt. Help needed!
When X and Y contain the coordinates like you said, assuming you need the address on a zeropage vector, this is a pretty fast way of getting the address:
dest = $C0 start LDX #15 LDY #15 ; TXA CLC ADC screen_address_lo,y STA dest LDA screen_address_hi,y ADC #$00 STA dest+1 ; LDY #$00 LDA #$51 STA (dest),y ; JMP $f445
screen_address_lo DB $00,$28,$50,$78,$A0,$C8,$F0,$18,$40,$68,$90,$B8,$E0,$08,$30,$58,$80,$A8,$D0,$F8,$20,$48,$70,$98,$C0 screen_address_hi DB $0C,$0C,$0C,$0C,$0C,$0C,$0C,$0D,$0D,$0D,$0D,$0D,$0D,$0E,$0E,$0E,$0E,$0E,$0E,$0E,$0F,$0F,$0F,$0F,$0F
Though a faster way would be not to calculate the address at all, but rather have it stored. If you want the characters to move from left to right like you had it in the first demo, you could just store the starting position of the row (e.g. $0C00, $0C28, etc) and store the position of the character within that row as a number which goes from 39 to 0. You already have your pointer (which never changes), you just need to index it with Y.
|
|
Posted By
Chicken on 2010-11-12 09:12:17
| Re: Assembly and interrupts: my first attempt. Help needed!
Short addition to Csabo's comment... You can find that table for the screen positions in ROM, too. In case ROM is banked in, you can just use that and save a few bytes
Low byte table: $d802
High byte table: $d81a
|
|
Posted By
Luca on 2010-11-12 11:02:13
| Re: Assembly and interrupts: my first attempt. Help needed!
OMG at last! A thread about both demos and assembly! I wished one in years! And it comes exactly in the same time I've reloaded some stuff to code! Because of this, a little OT to Csabo, for a forthcoming Plus4IDE improving (his own's request): - an "include" in order to call .bin and .prg (+2 bytes, you know) stuff; - macros in order to autogenerate speedcoding (unrolled) code; - ASL A/ROR A/etc... -> ASL/ROR/etc.. - the last version of AS65 works on x64 machines too, but it exclusively demands for lowercase text, why? - when I coded The IT Crowd I realized I was not able to put code in some memory areas (e.g. under $0FFF), would be possible to get around this? - find/replace of a whole bunch of code (very good for speedcode's retouch!).
The last one especially killed me: my closest friends here know I coded a slow fullscreen plasma years ago, and in these days I'm trying to speed it up a bit more, changing that damn unrolled code by hand (I wanna die!). Apropos: it's now faster (last fixing: no CLC before any ADC using 3 sintable pointers for a sintable with values between 0 and $ff/3, die CLC, die!), but it still runs at 2.2 frames circa, what a lamer I am
EDIT: 2.1 frame by now, still not enough :( Reducing the 40x25 fullscreen to 38x24 would help in keeping it under 2 frames and save something for an eventual little TEDtune, but meh :/
|
|
Posted By
Giuseppe on 2010-11-13 11:41:32
| Re: Assembly and interrupts: my first attempt. Help needed!
Thx everyone for the help about scan lines timing, they are very useful informations.
@csabo: about the "x/y to address" function, I've used a routine like the one you said, but it was not optimized like your! I love the code you posted, it is perfect I already modified mine and it works like a charm. TY! Oh, a little fix. You wrote:
"Change "ORG $2000" to "ORG $2002-2 / DW $2000""
I think it is:
"ORG $2000-2 / DW $2000"
In this way, it works fine (with PRG header) and code is located at $2000
|
|
|