Posted By
George on 2017-01-31 19:31:03
| Creating new Speech Data for the V364
I did some experiments with creating Speechdata. Maybe anybody can give me a little advise, because i am a newbie in assembler.
According to the Infos of this link here
http://web.archive.org/web/20131206092645/http://www.stefan-uhlmann.de/cbm/MVM/Speechdata/index.html
i managed to produce a new "speedata" (supposed to say "Yes") for the V386.
yes.PRG
Load the File in Yape (V386 mode) then type:
VOC 24576
SAY 1024
And we have a new "word" that sounds with a little fantasy like a very quick spoken "YES" (joking).
There is tool named "SOX" which generates LPC10 Data from wave file. I did that with a short wave-file "YES". I suppose the format is incopatible with the expected format, but there is a sound-speech-effect.
I am happy, that i managed to get any new sound from the emulated 364. My question: Did i place the sound-data in the right position of the file accordig to the link of Stefan !? I have no Examples-Prgs to compare.
Maybe somebody, who has some experience can answer me the question. If you change the first $09 Byte (supposed to set the Synthesizing Condition: framelenght, Bit/s, compression etc) at the beginning of last Data block ,you can hear some different effects.
|
|
Posted By
Gaia on 2017-02-01 02:41:54
| Re: Creating new Speech Data for the V364
LPC10, although very close, is not the same as PARCOR. I remember there was a python script somewhere created by the VICE team, one should test the raw parameters with that first. Here is the parameter dump (deciphered) of the V364 word 'ZERO':
0x00A,0x02B,0x227,0x3DE,0x02B,0x035,0x032,0x027,0x0FD,0x082,0x001,0x003, 0x012,0x02C,0x2B4,0x326,0x35A,0x0F8,0x004,0x02D,0x096,0x045,0x003,0x002, 0x012,0x02C,0x2B4,0x326,0x35A,0x0F8,0x004,0x02D,0x096,0x045,0x003,0x002, 0x01D,0x02C,0x3B7,0x321,0x37D,0x0DC,0x0F4,0x003,0x007,0x0C4,0x024,0x004, 0x01D,0x02C,0x3B7,0x321,0x37D,0x0DC,0x0F4,0x003,0x007,0x0C4,0x024,0x004, 0x01D,0x02B,0x3B7,0x321,0x37D,0x0DC,0x0F4,0x003,0x007,0x0C4,0x024,0x004, 0x023,0x02A,0x353,0x396,0x329,0x0F2,0x001,0x02B,0x082,0x0C0,0x020,0x001, 0x07E,0x029,0x353,0x396,0x329,0x0F2,0x001,0x02B,0x082,0x0C0,0x020,0x001, 0x07E,0x028,0x34D,0x004,0x316,0x013,0x01E,0x04B,0x07C,0x081,0x03E,0x03E, 0x07E,0x027,0x3C9,0x3BB,0x2AE,0x007,0x030,0x03E,0x010,0x0B9,0x020,0x03D, 0x07E,0x026,0x3C9,0x3BB,0x2AE,0x007,0x030,0x03E,0x010,0x0B9,0x020,0x03D, 0x07E,0x025,0x27E,0x3B3,0x3BF,0x0F6,0x015,0x02A,0x09A,0x07C,0x000,0x000, 0x053,0x024,0x254,0x0EA,0x013,0x017,0x000,0x0E1,0x095,0x001,0x021,0x001, 0x017,0x025,0x230,0x1A2,0x308,0x0FD,0x022,0x01F,0x096,0x000,0x000,0x03E, 0x02B,0x025,0x230,0x1A2,0x308,0x0FD,0x022,0x01F,0x096,0x000,0x000,0x03E, 0x07E,0x026,0x2D3,0x123,0x28F,0x044,0x019,0x0FE,0x098,0x038,0x000,0x001, 0x07E,0x027,0x317,0x0F9,0x306,0x003,0x03A,0x0FB,0x01C,0x07A,0x01E,0x003, 0x07E,0x028,0x297,0x13C,0x3FB,0x008,0x00E,0x0F7,0x0A9,0x0FC,0x03F,0x005, 0x07E,0x028,0x2EB,0x0A6,0x3D5,0x016,0x010,0x01C,0x092,0x038,0x001,0x002, 0x07E,0x028,0x279,0x14F,0x05F,0x00E,0x0EF,0x02B,0x09A,0x039,0x002,0x000, 0x07E,0x027,0x279,0x14F,0x05F,0x00E,0x0EF,0x02B,0x09A,0x039,0x002,0x000, 0x07E,0x026,0x279,0x14F,0x05F,0x00E,0x0EF,0x02B,0x09A,0x039,0x002,0x000, 0x066,0x025,0x294,0x09C,0x131,0x004,0x0DF,0x013,0x026,0x03E,0x000,0x002, 0x043,0x024,0x294,0x09C,0x131,0x004,0x0DF,0x013,0x026,0x03E,0x000,0x002, 0x01D,0x022,0x229,0x10E,0x092,0x0F1,0x0F9,0x023,0x09F,0x039,0x000,0x000, 0x00F,0x020,0x24B,0x1C5,0x025,0x0CC,0x002,0x031,0x01F,0x036,0x020,0x002, 0x008,0x01F,0x21B,0x194,0x018,0x0DF,0x000,0x030,0x096,0x07C,0x001,0x03E, 0x008,0x01D,0x229,0x10E,0x092,0x0F1,0x0F9,0x023,0x09F,0x039,0x000,0x000, 0x003,0x01D,0x21B,0x194,0x018,0x0DF,0x000,0x030,0x096,0x07C,0x001,0x03E
AFAIR these should be "stretched" linearly to 16-bit (I'll check YAPE's source). I also have found the pre-compiled binary of the SPTK from 2003 (wow...) but one needs cygwin to run it and I am not sure it'll run straight out of the box on a recent Cygwin. I'll let you know soon.
|
|
Posted By
SVS on 2017-02-01 05:12:46
| Re: Creating new Speech Data for the V364
Very interesting argument!
@George: the location where the file loads is correct (according to VOC parameter). But I'm not sure about the Yes.prg file content. It seems that the sound data are located inside it with an offset of $41A from the start, the rest (apart the header) is filled with zeroes. Maybe it is correct, since I see a $04 $1A in the header.
|
|
Posted By
George on 2017-02-01 06:45:50
| Re: Creating new Speech Data for the V364
Thank you for the replies.
Yes, the soundata was located from me at $041A on pupose in yes.prg. I wasn't sure, if i did it right. It starts with a "configuration"-Byte ( in this case $09, just experimental from me) and the rest is 1:1 LPC1 data converted from a wave-file.
The Config-byte low Nibble: Bit Function Set 0 Set 1 0 Filter stages 10 8 1 Repeat available not available 2 Frame length 20 ms/frame 10 ms/frame 3 Bits/Frame 48 bits/frame 96 bits/frame
high Nibble 4 Loss effect calculation none available 5 Sound source shape Pitch Triangle 6 Speech Data Uncompressed Compressed 7 ? not used, same as Bit 6
|
|
Posted By
Mad on 2017-02-01 12:36:10
| Re: Creating new Speech Data for the V364
OFF @Gaia: if you want to compile some linux stuff on windows, there is the new windows10 bash. I already compiled Krills loader that way!
|
|
Posted By
George on 2017-02-01 11:45:29
| Re: Creating new Speech Data for the V364
@Gaia: What do you mean stretch to 16 bit? a) 0x00A,0x02B -> 0x00,0x0A,0x00,0x02B (Expand from 12 to 16bit) b 0x00A,0x02B -> 0x00, xA0,0x2B
Would be cool, if somebody can provide a compiled version of STPK, since i don't have a Unix environment and noCygwin (to slow Internet to download). In that tool there is a LPC10 to Parcor converter, as far as i know.
Does anybody has the Speech-Data prgs from Stefan's site? They can't be downloaded anymore.
Stefans specification of Decoding the speechdata seems not to be 100%, since he doesn't mention Parcor but LPC10, thats why i tried it out first. The Stop-Sequenz does also not to be right, when comparing to the Zero-Dump.
Lets see, how far this experiment will go. I will document any new progress, for those, who are interested. Any help especially in finding the needed encoding (Windows-)Tools for Parcor is welcome. Maybe its also possible to write an encoder, which import a Wave and exports Parcor-Format.
|
|
Posted By
JamesC on 2017-02-01 12:09:48
| Re: Creating new Speech Data for the V364
@George - I have the speech data ("ripped" from C64 programs) from Stefan's site. PM me an email address.
|
|
Posted By
Gaia on 2017-02-01 16:21:03
| Re: Creating new Speech Data for the V364
@George: drop me an email and I'll send you my exe build (no guarantees that it still works though). You may still need the Cygwin DLL which is a relatively small download (are you on a modem connection BTW? )
Stretching: neither a) or b). Simply scale the parameter to 16 bit like: parameter_value / MAX_PARAM_VALUE * 65535
In the parameter data dump the first parameter is the "energy" (amplitude), the second is the "pitch" (tone), the following 10 (or sometimes just 4) parameters are the actual PARCOR comb filter coefficients. There is a patent somewhere (#4209844) that describes the formulas but you won't need those if you can get the SPTK work for you.
|
|
Posted By
George on 2017-02-02 17:18:52
| Re: Creating new Speech Data for the V364
@Gaia Yes, i have smartphone Conection (64kb/s)
Well after reading Stefans site again (The parameter dump is there too) i could find out, what you mean. The parameters have different lenghts.
My next try with the parameter dump of the word "Zero". I wrote a converter for the dump. I played around with the Config-Byte, and got different variations.
Zero_00.PRG Zero_09.PRG Zero_0A.PRG
Start the programm and type: VOC 24576 SAY 1024
We got a step further. At least, we created somthing near to the word "Zero".
******************************************************************* Edit 02.02.2017 ******************************************************************* @Gaia: The SPTK works in windows. Thank you for the help.
I was able to make a PACOR Data file:
Here are the steps: 1.) Make from a wave file a raw-file: wav2raw.exe data.wav data.raw 2.) Make from a raw file a float-file: x2x +s data.raw > data.f 3.) Make from a float.file a Parcor file: frame < data.f | window | lpc -m 10| lpc2par -m10 > data.rc
Here is the point, where we i am stuck. When you put data.rc into the Prg-file you get nice "Alien"-talk for the next Star Wars movie.
sample.PRG
The questions are: A) Which format must the Source Wave-File have? b) The conversion to PARCOR is made in 3 steps (Window, lpc und lpc2pa command). Each has further parameters, which i dont understand. How must the parameters be set, to match the expected Paramaters from the Toschiba-Chip?
Without any further Infomation from the designers, this might me hard to find out. Maybe somebody knows the right guys (Bill Herd, etc) to ask?
|
|
|
Posted By
Gaia on 2017-02-03 03:08:04
| Re: Creating new Speech Data for the V364
The MV-demo was done by me as an an initial Magic Voice --> V364 demo disk conversion I sent it to Stefan because he had been really obsessed with the MV/V364 for a long time.
|
|
Posted By
George on 2017-02-03 07:58:55
| Re: Creating new Speech Data for the V364
I can understand Stefan obsession. For me the topic is really fascinating.
The "abeec" Demo doesn't work. Its the only Demo which uses uncompressed Data (configbyte is set to 03) all other demos use compressed data (Configbyte 4A).
Somebody should check the demo out on real hardware (c64 + MV and V364), to exclude the possiblity that the emulation doesn't work right. If thats the case, everthing stops here for me.
I don't think that the developer made false settings, because every Sample begins with 03 (but who knows).
While i write here, i have the idea to ask the devolopers of the Games (Gorf, etc..), how they made the sample data? Maybe they have some tech-documentation? We should ask them. e.g Eric Cotton made Gorf.
|
|
Posted By
Gaia on 2017-02-04 03:31:13
| Re: Creating new Speech Data for the V364
VICE has T6721A emulation, too, and although I helped them deciphering the speech parameters, it is a more or less independent implementation from YAPE so you could try it as well.
Note, that 48 bit encoding is still not deciphered (V364 is not using it, only 1 single word on the MV C64 as far as I remember). The best guess I have so far is this (parameter bitlengths): 4, 7, 10, 10, 7, 2, 2, 2, 2, 2 (for voiced) 4, 7, 9, 9, 9, 2 (unvoiced)
Speech for games: according to the interview I made with Bil Helrd in 2006, they compiled it on a VAX machine back in the day (search for 'V364' or 'VAX'). http://plus4world.powweb.com/features/Interview_with_Bil_Herd
|
|
Posted By
SVS on 2017-02-04 13:25:23
| Re: Creating new Speech Data for the V364
Maybe a bit OT:
The V364 has not the same software than MagicVoice, in fact the embedded vocabulary is different (there are 4 words in MV that V364 has not). Furthermore the V364 has 261 words against the 235 of MV. Having the V364 overcome the 255 barrier, maybe the system architecture too is different.
|
|
Posted By
JamesC on 2017-02-04 13:36:53
| Re: Creating new Speech Data for the V364
@SVS - we're discussing additional speech data. VOC(24576) and SAY(1024) stuff.
We're not trying to change the original Speech ROM. At least not yet. *evil laugh*
|
|
Posted By
George on 2017-02-04 14:09:03
| Re: Creating new Speech Data for the V364
@SVS I found in the abeec-demo the word "bee", which is the only word (so far)coded in the compressed format in that demo sounds exactly like "B" in the 364 Standard vocabulary. The speech data coding is the same.
@Gaia. The 96 bit format matches an expected 12x8bit (Byte) format and is excactly the Count of parameters Parcor needs. You can configure the outputformat in sptk to short, int,float (default) ,double etc. My question in the round here, are the datatypes of the plus4 the same as in Unix/Windows? And what data type does the Toshiba Chip expect? Something known about it? Is float the same on both system? Did Vax use big endien or little endian?
|
|
Posted By
MMS on 2017-02-04 23:08:07
| Re: Creating new Speech Data for the V364
"We're not trying to change the original Speech ROM. At least not yet."
Actually, I think this team may do a much better version with the current technology and sound editor tools we have in our hands...
|
|
Posted By
JamesC on 2017-02-04 23:40:59
| Re: Creating new Speech Data for the V364
Actually, I think this team may do a much better version with the current technology and sound editor tools we have in our hands...
I wouldn't object to fixing the bugs in the original 364 Speech ROM. Or putting additional words into a new ROM that can sit in C3 High. Or even additional words that can be loaded from disk.
But completely replacing the 364's built-in vocabulary? No, I'd like to keep that. I might want YAPE to say "I am the Commodore V-three-six-four" someday.
|
|
Posted By
MMS on 2017-02-05 06:09:59
| Re: Creating new Speech Data for the V364
I see your point, but you misunderstood me. The vocabulary should remain the same, but I notice bugs in the words encoding, as you can hear plopping sounds, that should not bee there. It detoriates the overall user experience of the advanced sound of the Toshiba IC. (see SAM: is has no such problem, just the resolution of the soud is much lower due to missing hardware (and actually I was wrong previously) )
|
|
Posted By
George on 2017-02-05 15:10:44
| Re: Creating new Speech Data for the V364
Vice: The" abeec"-demo in Vice brings out some speech-sound (no "Alien"-talk), but its not very clear and loud. (Maybe timing problems?). Again the assumption, that the emulation is not 100%. Somebody should really try real hardware. I didnt try the C64 with MV (emulation) so far. The conclusion is, that the Speech-Data is probaly not corrupted. It can be uses as refrence.
SPTK: I made some mistakes: - Extracting the raw data from a wave: Since wa2raw is only in the newer SPTK, i used one tool from the internet. But it didn't' cut the header form the wave but produced a much bigger file than the original..
- Frame-length: Parcor divides the data in frames (slices of N ms length). Each frame ist encoded into that 12 parameters. You can set the Toshiba to framelenght 20 ms or 10 ms. The parameters for that lenght in SPTK are -l 320 or -l 160 (16 points are 1 ms)
- I recorded a new source wave iwith 8khz mono 16 bit. (Lowest i could find, do 8 bit exist?), because the Toshiba plays in 8khz too.
-When you apply the conversion now, you have a wonderful Matirx of Parcor-Parameters shrinked into a Handfull-bytes.
-The Result is still incompatible, but the result reminds more Converted Speech than all tries before. And now we know, that Yape doesn't play uncompressed data at all. So I have to continue with Vice, which i don't like.
|
|
Posted By
Gaia on 2017-02-05 15:20:09
| Re: Creating new Speech Data for the V364
@George: go for the 'SHORT' type. The T6721A operates with 15 bits internally. You also need to match the bit resolution (9 bits, but go for 8 bits), the output frequency (8 kHz) and the bit rate (96 bits per frame like for most words as discussed, since this is the only known bit format). Actually the bit rate does not really matter, just make 12 new PARCOR parameters - 1 pitch, 1 energy and 10 filter coeffs - per each frame (every 20 ms). You can scale them back to the appropriate bit format later. You also have to make sure you set the frame interval to 20 ms (ie. 50 Hz). In between frames the parameters will be interpolated by the T6721A.
@MMS: It's not the tooling which is/was the bottleneck but the capabilities of the PARCOR speech synthesis. It is modelling the human vocal tract in an extremely condensed format with low output frequency. It is 9 bit which is wonderful, so I believe where we could use it is rather playing music with it, rather than vastly improving the speech quality. Remember, the Magic Voice demo could even sing.
@JamesC: You can already do that SAY"I":SAY"AM":SAY"THE":SAY"COMMODORE":SAY"V":SAY 3 :SAY 6 :SAY 20:SAY 4
EDIT: Yape should play uncompressed as well. So I need to have a look, most likely it's a bug. I haven't touched the code for more than 10 years now. Stay tuned. EDIT2: OK I checked and ABC uses the 48-bit which VICE does not support either, so I am a bit confused now how it could work for you.
|
|
Posted By
George on 2017-02-05 15:54:33
| Re: Creating new Speech Data for the V364
@Gaia: I play the ABeeC-demo on both emulators again: Correction: * You can here something in Yape, but is more noise then speech. (I had in mind, that you hear nothing) * In Vice you can here actually voice, very high and fast, but it plays voice.
The unusual here is, that the SpeechData-format has the 48 bit/frame format, but with no compression (Bit 6 is set to 0) (Config-Byte: 03).
The Config-byte low Nibble: Bit Function Set 0 Set 1 0 Filter stages 10 8 1 Repeat available not available 2 Frame length 20 ms/frame 10 ms/frame 3 Bits/Frame 48 bits/frame 96 bits/frame
high Nibble 4 Loss effect calculation none available 5 Sound source shape Pitch Triangle 6 Speech Data Uncompressed Compressed 7 ? not used, same as Bit 6
|
|
Posted By
Gaia on 2017-02-05 16:14:37
| Re: Creating new Speech Data for the V364
ROM speech is also uncompressed. The differences in the ROM and ABeeC are only the filter length and the bit rate.
ROM: Speed select : 1.100000 Synth condition #1 set : 00 - loss effect calculation : 0 - sound source shape pitch(0) or triangle wave(8) : 0 Synth condition #2 set : 0A - filter stages : 10 - repeat available : 2 - frame length : 20 ms/frame - bit rate : 96 bits/frame
ABeeC Speed select : 1.100000 Synth condition #1 set : 00 - loss effect calculation : 0 - sound source shape pitch(0) or triangle wave(8) : 0 Synth condition #2 set : 03 - filter stages : 8 - repeat available : 2 - frame length : 20 ms/frame - bit rate : 48 bits/frame
|
|
Posted By
George on 2017-02-05 16:32:34
| Re: Creating new Speech Data for the V364
@Gaia: Did you read the Parameters out, while playing, or did you directly look into the speech-ROM?
I don't have the speech-rom as file (would be cool, if you can send it to me), so i didnt look there. The only thing i can say for sure is, that in the Speechdata-files in the working demos and for the "Bee"-word (in ABC) there is Synth-condition#1: 04 (not 0 like in your example). According to Stefans-Tablet this mean: Speech Data: Compressed (but Bit 3 is not listet in your Table). Now i am confused too..
|
|
Posted By
Gaia on 2017-02-05 16:47:18
| Re: Creating new Speech Data for the V364
I dump directly from YAPE I can send an EXE that dumps speech data if you want.
The Speech ROM you can download form many places, like even my homepage: http://yape.homeserver.hu/download/spk3cc4.bin
EDIT: FYI, about 13 years ago I used to convert 'zero.raw' to PARCOR with SPTK back and forth using the following script. It also dumps the parameters for you that you can use the feed the Toshiba chip after the appropriate bit conversion:
x2x +sf < zero.raw > zero.float frame.exe -l 320 -p 40 < zero.float | window.exe -l 320 | lpc.exe | lpc2par.exe > zero.lt frame.exe -l 320 -p 40 < zero.float | window.exe -l 320 | pitch.exe -l 320 > zero.pitch excite.exe -p 320 < zero.pitch > zero.exc ltcdf zero.lt zero.exc > zero.syn x2x +fs < zero.syn > zero.syn.raw @echo Converting to short... x2x +fs < zero.pitch > zero.pitch.short x2x +fs < zero.lt > zero.lt.short x2x +fs < zero.exc > zero.exc.short
'zero.raw' is the raw input and zero.syn.raw is the re-synthesized 'zero' word.
On the 48-bit per frame: I opened the T6721A chip sheet, and alas, on page 40 it says, that 'Non linear conversion: available for 48 bits/frame'. This is bad news since it means it's not simple scaling like for 96 bpf, so there has to be a kind of a lookup table in the chip...
|
|
Posted By
George on 2017-02-05 17:28:18
| Re: Creating new Speech Data for the V364.
Thanks for the link. As i wrote before, i have slow stoneage Internet. You got mail form me.
I got this screenshot direct from the ROM-File. Its the standard word ZERO. It begins with 4A (blue) the second word (ONE) begins with $4A too. When you cut the sequence out and put it into a separate file. It works only with 4A (Syntheconditon), 0A doesn't work. (Test it with my PRG'S above and copy the sequence)
Can you agree here or i am somewhere wrong?
Edit: Thanks for the script. Very interesting.
|
|
Posted By
Gaia on 2017-02-05 17:26:10
| Re: Creating new Speech Data for the V364
You are on the wrong track. You can not just copy paste speech data from the ROM... it has a yet unsolved SW decompression. It is better to feed the chip with raw parameters.
|
|
Posted By
George on 2017-02-05 17:35:58
| Re: Creating new Speech Data for the V364 understand.
I understand. But does "0A" not mean, SW-Compression is off and you feed the chip with RAW Parameters (instead of 4A)?
|
|
Posted By
Gaia on 2017-02-05 18:59:19
| Re: Creating new Speech Data for the V364
The Toshiba chip itself can not have SW decompression. That's by definition HW decompression. The ROM is feeding the uncompressed data to the chip which it decompresses on the fly. There is no point at this stage figuring out what it does exactly, since writing a small routine - although would consume more space - could at least get you going. So don't use the SAY command and forget about BASIC. Write a small machine code routine that sets up the parameters, perhaps an IRQ and then feeds the chip with raw uncompressed data that you obtained from the SPTK (and bitformatted appropriately). A compression can be implemented later.
|
|
Posted By
George on 2017-02-06 15:11:18
| Re: Creating new Speech Data for the V364
I did not mean, that the chip does the SW decompression. Since there is no official documentation, i assumed that by setting bit 6 to 1(according to Stefans site). Everything after 4A is compressed.
By setting bit 6 to 0 (eg. 03 in "ABC)) everything is uncompressed, so you could perhaps copy the outpout of sptk. (we are just experimenting here)
Another thought i had today is, when they build a chip which does the speech output, than there must be a chip from Toshiba, which creates the input data. So that the data can be send at low bitrate to the speech systhesis chip.
Edit and last entry from me: * Gaia is right. The speechdata is compressed. You can't just copy, * I will stop it here, because my assembler skills are unsuficient. I cant write a player. * I testet ABC-Demo with Vice 64 +Magic Voice. It works fine with it.
The time was not lost. I learned a lot from this puzzle. Thank you all for your help.
|
|
Posted By
Gaia on 2017-02-06 17:31:59
| Re: Creating new Speech Data for the V364
Just checked the ABC cartridge in VICE's Magic Voice emulation (C64) but I am still getting incomprehensible garbage speech for the 48-bit data.
|
|
Posted By
George on 2023-08-18 15:57:15
| Re: Creating new Speech Data for the V364
A few years have passed and i am still interested in the challenge.
I want to try to feed the T6721 speech chip with raw data in assembly as Gaia suggested. Maybe AI can help. Which registers are relevant for this task on the 364. The memory map i found is not very precise.
Any hints apreciated.
Edit: http://yape.homeserver.hu/download/dassspk3.txt
|
|
Posted By
SVS on 2023-08-19 04:40:02
| Re: Creating new Speech Data for the V364
You can found info about V364 registers, on my Ultimate map, "O.S. Map" section, address since $FD20...
|
|
|