Previous Messages |
Posted By
Harry Potter on 2024-10-27 16:41:32
| Re: printtok being updated: ideas?
I was wrong, and I'm sorry! There was an error in the code. It gave me only one byte and, as such, isn't worth the extra complexity. I could really use some help here.
|
|
Posted By
Harry Potter on 2024-10-27 16:32:11
| Re: printtok being updated: ideas?
I added an approach to compress repeated 3-character blocks like LZ77 but just for 3-character blocks, and now the numbers I'm getting are too good to be true. The results are 120 bytes compressed and 51.9% compressibility.
|
|
Posted By
Harry Potter on 2024-10-07 09:52:24
| Re: printtok being updated: ideas?
I'm currently working on PrintTok2 and believe it's doing pretty well, considering the test text is pretty small and poorly compressible. I've been working on a Z-Machine-style 5-bit approach with a lot of enhancements such as tokenization, support for more punctuation marks on letter dictionaries at the cost of one bit on them and several lesser-used letters, an extra bit on dictionary-swapping to indicate whether is just for the current char or several chars and the removal of an extra bit per word compressed. I am also working on a naive literals technique, where literals aren't compressed. I'm using tokenization and a form of BPE there. The tokens are up to 128 one-byte and 128 two-byte tokens, and I can add more two-byte tokens. My version of BPE borrows up to 32 one-byte tokens to act as an offset to the last repeat of two chars. I'm asking if anybody here has any ideas to better this.
|
|
Posted By
Harry Potter on 2024-09-18 07:54:52
| Re: printtok being updated: ideas?
As of now, my experiments with PrintTok1 decreased the size of the database file of the C64 version of my Smir 3, 1 text adventure by 35.5%. That's not much but is something. I'm after 40% before updating to PrintTok2. It currently compresses using tokenization and RLE of spaces. I am working on PrintTok2, which is to provide several ways to compress literals, a version of BPE where a previously-repeated two-char block is to be shortened to a byte that indicates how many bytes ago the repeat occurred and automatic compression. I am asking if anybody here has any ideas on how to improve PrintTok, both as it is now and for a future update. BTW, the latest cc65 versions are at https://sourceforge.net/projects/cc65extra/files/ui/. Version 006 provides up to 64 one-byte tokens and 128 two-byte tokens, and 007 128 one-byte tokens and 128 two-byte tokens. A version for Super C 64 and 128 with up to 90 one-byte and 128 two-byte tokens is at https://sourceforge.net/projects/cc65extra/files/supercstuff/.
|
|
Posted By
King Arthur on 2023-08-13 03:21:18
| Re: printtok being updated: ideas?
Thou hast offended my African swallow! It will no longer carry a coconut! You shall pay retribution by..., oh don't worry, it doesn't involve a shrubbery... by... never mentioning cc65 ever again! Ni! Ni! Ni! Ni!
|
|
Posted By
Harry Potter on 2023-07-27 18:19:45
| Re: printtok being updated: ideas?
I got a lot of work done with printtok2: I can now compile some test code to parse the strings being compressed, but the program crashes my system upon execution. The code is written in ANSI C and compiled with Digital Mars C--I don't know which version. If I post the main file, will anybody here help me debug it?
|
|
Posted By
Lord Voldemort on 2023-07-20 00:38:24
| Re: printtok being updated: ideas?
Master the wand, and I master Potter at last!
|
|
Posted By
Harry Potter on 2023-07-18 06:46:10
| Re: printtok being updated: ideas?
The aforementioned next version is still in development, but I have a version with 32 more one-byte tokens and better extended memory support: you can now print from both main memory and extended memory from the same program. It is at https://sourceforge.net/projects/cc65extra/files/ui/. Try it out!
|
|
Posted By
Harry Potter on 2023-07-11 08:43:39
| printtok being updated: ideas?
Hi! I am currently in the process of updating my printtok.c module for cc65. The current version is limited, as it only uses tokenization and RLE of spaces and requires manual compression. The next version is to include better tokenization support (up to 128 tokens and, if not compressing literals, all are one-byte tokens), ways to compress literals and automatic compression support. If you have any other ideas on how to improve printtok.c, please reply.
|
|