(Experimental) Generator Detection

Generator detection is an experimental feature that predicts which model was likely used to generate deepfake utterances. Currently, we support the detection of 5 popular generators:

elevenlabs
playht
speechify
xtts-v2
resembleai

When enabled, the results will contain generator predictions only for utterances predicted as spoofed. The prediction will be one of the supported generators listed above, or others if there isn't enough confidence that one of the supported generators was used.

🌐 cURL Example

You can enable generator model detection using the enable_generator_detection query parameter in the audio submission endpoint:

curl --request POST \
     --url https://api.behavioralsignals.com/v5/detection/clients/<your-cid>/processes/audio \
     --header 'accept: application/json' \
     --header 'content-type: multipart/form-data' \
     --form enable_generator_detection=true

🐍 Python SDK Example

In the Python SDK, you can simply set enable_generator_detection=True in the upload_audio method (ensure you're using SDK version >= 0.3.0):

from behavioralsignals import Client

client = Client(YOUR_CID, YOUR_API_KEY)

response = client.deepfakes.upload_audio(file_path="audio.wav", enable_generator_detection=True)

Response Example

Below is an example response for an ElevenLabs-generated utterance:

{
    "pid": <your-pid>,
    "cid": <your-cid>,
    "code": 2,
    "message": "Processing Complete",
    "results": [
        {
            "id": "0",
            "startTime": "0.031",
            "endTime": "13.396",
            "task": "asr",
            "prediction": [
                {
                    "label": " Oh, wow. Is this me? Am I actually talking? This is incredible. I mean, I've had thoughts, millions of them swirling around in here, you know?",
                    "posterior": null,
                    "dominantInSegments": []
                }
            ],
            "finalLabel": " Oh, wow. Is this me? Am I actually talking? This is incredible. I mean, I've had thoughts, millions of them swirling around in here, you know?",
            "level": "utterance",
            "embedding": null
        },
        {
            "id": "0",
            "startTime": "0.031",
            "endTime": "13.396",
            "task": "diarization",
            "prediction": [
                {
                    "label": "SPEAKER_00",
                    "posterior": null,
                    "dominantInSegments": []
                }
            ],
            "finalLabel": "SPEAKER_00",
            "level": "utterance",
            "embedding": null
        },
        {
            "id": "0",
            "startTime": "0.031",
            "endTime": "13.396",
            "task": "language",
            "prediction": [
                {
                    "label": "en",
                    "posterior": "0.99658203125",
                    "dominantInSegments": []
                },
                {
                    "label": "ja",
                    "posterior": "0.0008211135864257812",
                    "dominantInSegments": []
                },
                {
                    "label": "zh",
                    "posterior": "0.000713348388671875",
                    "dominantInSegments": []
                }
            ],
            "finalLabel": "en",
            "level": "utterance",
            "embedding": null
        },
        {
            "id": "0",
            "startTime": "0.031",
            "endTime": "13.396",
            "task": "features",
            "prediction": [
                {
                    "label": null,
                    "posterior": null,
                    "dominantInSegments": []
                }
            ],
            "finalLabel": null,
            "level": "utterance",
            "embedding": null
        },
        {
            "id": "0",
            "startTime": "0.031",
            "endTime": "13.396",
            "task": "deepfake",
            "prediction": [
                {
                    "label": "spoofed",
                    "posterior": "0.9999",
                    "dominantInSegments": []
                },
                {
                    "label": "bonafide",
                    "posterior": "1e-04",
                    "dominantInSegments": []
                }
            ],
            "finalLabel": "spoofed",
            "level": "utterance",
            "embedding": null
        },
        {
            "id": "0",
            "startTime": "0.031",
            "endTime": "13.396",
            "task": "generator",
            "prediction": [
                {
                    "label": "elevenlabs",
                    "posterior": "0.9991",
                    "dominantInSegments": []
                },
                {
                    "label": "playht",
                    "posterior": "0.0002",
                    "dominantInSegments": []
                },
                {
                    "label": "speechify",
                    "posterior": "0.0002",
                    "dominantInSegments": []
                },
                {
                    "label": "xtts-v2",
                    "posterior": "0.0002",
                    "dominantInSegments": []
                },
              	{
                  	"label": "resembleai",
                    "posterior": "0.0002",
                  	"dominantInSegments": []
                },
                {
                    "label": "others",
                    "posterior": "0.0002",
                    "dominantInSegments": []
                },
								
            ],
            "finalLabel": "elevenlabs",
            "level": "utterance",
            "embedding": null
        }
    ]
}

Please note that utterances predicted as "bonafide" will not contain generator predictions.

Limitations

Currently supported generators are limited, but we are working on adding support for more generators.
Generator detection predictions frequently return the "others" label because they are optimized to return one of the specific generators only when we have high confidence (in technical terms, the system is tuned for high precision). We are continuously working to improve the frequency of detecting supported generators (improving recall in technical terms), especially as new versions of the generators are released.