Embeddings

Just like the Behavioral API , the Deepfake Detection API offers a similar set of embeddings. Specifically, setting embeddings=true in the query parameters of the request includes the following embeddings in the response:

Speaker Embedding: Defined identically to the Behavioral API,
Deepfake Embedding: A 768-dimensional low-level feature representation of the audio signal, which can be used for downstream applications revolving audio deepfake detection.

🌐 cURL Example

For embeddings to be included in the response, the user must set the embeddings query parameter in the submit audio request:

curl --request POST \
     --url https://api.behavioralsignals.com/detection/clients/your-client-id/processes/audio \
     --header 'X-Auth-Token: your-api-token' \
     --header 'accept: application/json' \
     --header 'content-type: multipart/form-data' \
     --form name=my-awesome-audio \
     --form embeddings=true \
     --form 'meta={"key": "value"}'

🐍 Python SDK Example

Obtaining embeddings is as simple as setting embeddings=True when submitting the recording:

from behavioralsignals import Client

client = Client(YOUR_CID, YOUR_API_KEY)

response = client.deepfakes.upload_audio(file_path="audio.wav", embeddings=True)

Schema Response

The schema response is augmented in a similar manner with the Behavioral API, when enabling the embeddings flag.

the "embedding" field of the item of the response with "task": "diarization" will have the d=192 speaker embeddings serialized as a string to its value, instead of null,
the response will have an additional item, with "task": "features" and the d=768 behavioral embedding serialized as a string to its value

Notice that, if you submit the same audio to both the Behavioral and Deepfake APIs, the speaker embedding will be the same (we use the same model in both APIs) - but the embedding under "features" will be different since these are representations from entirely different models (and useful for entirely different purposes).

Example (we have trimmed the serialized values with "..."):

{
    "pid": 0,
    "cid": 0,
    "code": 2,
    "message": "Processing Complete",
    "results": [
        {
            "id": "0",
            "startTime": "0.217",
            "endTime": "5.026",
            "task": "asr",
            "prediction": [
                {
                    "label": " Wait for my buddy. Wait for my buddy to get over here. Hey, we're just doing a social experiment. That's it.",
                    "posterior": null,
                    "dominantInSegments": []
                }
            ],
            "finalLabel": " Wait for my buddy. Wait for my buddy to get over here. Hey, we're just doing a social experiment. That's it.",
            "level": "utterance",
            "embedding": null,
            "st": 0.217,
            "et": 5.026
        },
        {
            "id": "0",
            "startTime": "0.217",
            "endTime": "5.026",
            "task": "diarization",
            "prediction": [
                {
                    "label": "SPEAKER_00",
                    "posterior": null,
                    "dominantInSegments": []
                }
            ],
            "finalLabel": "SPEAKER_00",
            "level": "utterance",
            "embedding": "[16.078125, ..., 12.0625]",
            "st": 0.217,
            "et": 5.026
        },
        {
            "id": "0",
            "startTime": "0.217",
            "endTime": "5.026",
            "task": "language",
            "prediction": [
                {
                    "label": "en",
                    "posterior": "0.9560546875",
                    "dominantInSegments": []
                },
                {
                    "label": "zh",
                    "posterior": "0.005016326904296875",
                    "dominantInSegments": []
                },
                {
                    "label": "ru",
                    "posterior": "0.0034465789794921875",
                    "dominantInSegments": []
                }
            ],
            "finalLabel": "en",
            "level": "utterance",
            "embedding": null,
            "st": 0.217,
            "et": 5.026
        },
        {
            "id": "0",
            "startTime": "0.217",
            "endTime": "5.026",
            "task": "features",
            "prediction": [
                {
                    "label": null,
                    "posterior": null,
                    "dominantInSegments": []
                }
            ],
            "finalLabel": null,
            "level": "utterance",
            "embedding": "[0.24414099752902985, ..., -0.6791989803314209]",
            "st": 0.217,
            "et": 5.026
        },
        {
            "id": "0",
            "startTime": "0.217",
            "endTime": "5.026",
            "task": "deepfake",
            "prediction": [
                {
                    "label": "bonafide",
                    "posterior": "0.9999",
                    "dominantInSegments": []
                },
                {
                    "label": "spoofed",
                    "posterior": "1e-04",
                    "dominantInSegments": []
                }
            ],
            "finalLabel": "bonafide",
            "level": "utterance",
            "embedding": null,
            "st": 0.217,
            "et": 5.026
        }
    ]
}