Forum (#50656) - Plus/4 World



	Home Search Games Tapes Covers Cheats Maps Software New Stuff Hall Of Fame HVTC Game Endings Solutions Remakes Publications Magazines Effects Top List Members Groups Features Upload Plus/4 Encyclopedia Hardware Tools Options Forum Home Search Games Tapes Covers Cheats Maps Software New Stuff Hall Of Fame HVTC Game Endings Solutions Remakes Publications Magazines Effects Top List Members Groups Features Upload Plus/4 Encyclopedia Hardware Tools Options Forum	Login

Back to forum

See the full topic

Go to last reply

Posted By

Harry Potter
on 2024-10-07
09:52:24

Re: printtok being updated: ideas?

I'm currently working on PrintTok2 and believe it's doing pretty well, considering the test text is pretty small and poorly compressible. I've been working on a Z-Machine-style 5-bit approach with a lot of enhancements such as tokenization, support for more punctuation marks on letter dictionaries at the cost of one bit on them and several lesser-used letters, an extra bit on dictionary-swapping to indicate whether is just for the current char or several chars and the removal of an extra bit per word compressed. I am also working on a naive literals technique, where literals aren't compressed. I'm using tokenization and a form of BPE there. The tokens are up to 128 one-byte and 128 two-byte tokens, and I can add more two-byte tokens. My version of BPE borrows up to 32 one-byte tokens to act as an offset to the last repeat of two chars. I'm asking if anybody here has any ideas to better this.