I found this question, 2 days ago: https://retrocomputing.stackexchange.com/questions/30984/what-adpcm-algorithm-did-nec-use-in-their-%CE%BCpd775x-chips
I hope someone will manage to write an encoder. The information there helps a lot!
As for the responses from Mr. Pawel Wozniak and Mr. Peter Benie, I will show everyone Peter Benie's message:
"Hi,
In our reverse engineering project, the original authors had simply discarded any code written by the people who built the CallText interface, and replaced it with their own byte-code interpreter for KlattTalk and supplied their own routines to generate the wave forms.
There was some digital processing done on the 7720, but I doubt it was any standard encoding, because there was no need to match any standard. The 7720 doesn’t have any sound-encoder, so it has no special support for ADPCM.
The two things the 7720 has that make it special are:
The 16 x 16-bit multiplier – one multiplication per clock cycle, and
On each cycle, the operation code tells all the functional blocks what to do, not just one block.
This is how it gets high throughput, and is what makes it a DSP chip rather than a general purpose CPU.
That doesn’t help you though.
Something that stands out from the datasheet is that ADPCM is not a single standard – it is a description of a technique for encoding. It is a bit like saying a number is encoded as Floating Point; it may be true but that leaves open several arbitrary choices for the encoding of mantissa and exponent.
I couldn’t find a manual with details for the 7759, but looking at
https://github.com/mamedev/mame/blob/master/src/devices/sound/upd7759.cpp#L494your repeated byte does indeed mean 128 bytes come next, though the comments in the code more correctly say 256 nibbles. These are fed, one nibble at a time, into the ADPCM state machine in lines 327-365.
You have to stare at the state machine for a while to figure out what it does, but when you see it, it turns out to be quite simple. It matches the block diagram on page 5 in this paper:
https://people.cs.ksu.edu/~tim/vox/dialogic_adpcm.pdf
with the exception that on the 7759, the step size and output size are 9 bits, not 12, so scale everything down accordingly.
The key thing to note is that the values in the input data represent the differences between successive output values.
In order to get both fidelity and range, these differences are scaled according to one of the curves in this diagram.
![[Linked Image from i.ibb.co]](https://i.ibb.co/gtdXTmJ/Diagram.png)
To calculate the next output value, the decoder looks up the input value on the selected curve and adds that amount to the previous output value.
It then picks a new curve, depending on that input value. For small input values, it picks a shallower curve to use next time. For large input values, it chooses a steeper curve.
That’s all ADPCM is.
In more detail:
Let curves[0..15] be the selection of curves in the above diagram, each mapping [-7..-0, +0..+7] to [-255 .. +255]
Let selected_curve = 0
Let output_value = 0
For each input value: # input is in the range [-7..-0, +0..+7]
# Step 1 – calculate the next output value.
output += curves[selected_curve][input_value]
# Step 2 – pick the next curve
If |value| in {0, 1}: selected_curve -= 1 # pick a shallower curve
If |value| in {2, 3}: do nothing
If |value| == 4: selected_curve += 1 # pick a steeper curve
If |value| in {5, 6}: selected_curve += 2
If |value| == 7: selected_curve += 3 # pick a much steeper curve
Clamp selected_curve between 0 and 15 inclusive.
The input values are encoded as sign-and-magnitude, so -0 and +0 are distinct values, both with magnitude 0. The steeper curves treat -0 and +0 differently.
An interesting feature is that both the Dialogic algorithm and the 7759 algorithm both use 4-bit sign-and-magnitude data for the input stream, which begs the question, what would happen if you were to put the data intended for one algorithm into the other?
In general, positive values would yield a rising slope, and negative values would yield a falling slope. Furthermore, large values would result in a steep slope and small values would result in a shallow slope. But if the slopes were chosen differently, the shape would be wrong and sound would be distorted.
There’s a very high chance that the fundamental mode would be clearly audible and at the correct pitch (modulo sample rate); you’d recognise which sound it was, even though it would sound ‘off’.
You might also run the risk of overflow if the output values were larger than intended. In the mame emulator, output value overflow results in undefined behaviour (only unsigned integers have defined overflow behaviour), but in practice it is likely to result in wraparound, which you would hear as a very loud pop.
You would almost certainly get a d.c. offset in the output, but you normally wouldn’t be able to hear that except on the transition to a silent block. This offset would look like a random walk – varying but spending most of the time going nowhere. But if you waited long enough, it would eventually be large enough to make normal data cause an overflow, which you would definitely hear.
Based your description and on the above, I’m convinced that what is going on in the noisy sounds is that you are feeding in data intended for a different set of curves.
In the ksu paper, page 4 shows a block diagram of the simplest possible encoder, which will let you make new sounds.
The encoder can’t produce a perfect output – it will have some quantisation error. The important thing to note from the diagram is that the differences, d(n), are not the differences between successive input values; they are the differences between the current input and the immediately previous quantised output. This has the effect of carrying any error forward to the next calculation; the error is never lost, but it might take several cycles for it to be incorporated, depending on which curve is selected.
I don’t know if you’ve come across the z-transform notation before. In the block diagrams, most of the boxes are assumed to take zero time and have no state. The z^-1 boxes, are a one position shift register so they do have state; on each operation, a new value goes in and the old value is pushed out. This means that when you turn on the “machine”, there must already be a value in the shift registers. For both of them, use the value 0 to match the implementation of the 7759 decoder.
I hope that’s of some help.
Peter"