Posted By
Harry Potter on 2024-11-23 18:44:22
| Re: Pursue PrintTok2?
I currently have PrintTok in working order, and it is providing a very good compression ratio, but I want to do better. I'm using Toldo's suggestion. He's a member of the vogons forum. He uses a form of tokenization, where 12 bits are used to represent every word in the text. He also assumes a space between words. I add mechanisms to not compress in that way words that rarely appear in the text, a form of Static Huffman Codes on letters and some punctuation, a bit to determine whether a space follows a period or comma and using only as many bits as sre needed to specify a token. I am looking for other ideas.
|