Submit an audio for processing

Once you have created a Project (cid + API token) you can use the batch API to send recordings for analysis.

🌐 Using the REST API

You can use the low-level REST API if you want to integrate our API to your application, in your language of choice (Java, nodeJS, etc.). For Python we recommend using the SDK as shown in the next section.

Submit the recording

You can call the https://api.behavioralsignals.com/v5/detection/clients/your-cid-here/processes/audioendpoint to submit a .wav/.mp3 recording.

Example using curl:

curl --request POST \
     --url https://api.behavioralsignals.com/v5/detection/clients/<your-cid>/processes/audio \
     --header 'X-Auth-Token: <your-api-token>' \
     --header 'accept: application/json' \
     --header 'content-type: multipart/form-data' \
     --form file='@/path/to/audio.wav'

Response:

{
  "pid": <process-id>,
  "cid": <your-cid>,
  "name": null,
  "status": 0,
  "statusmsg": "Pending",
  "duration": 0.0,
  "datetime": "2025-07-29T09:14:18.261564618",
  "meta": null
}

Retrieve the results

After the processing is complete, you can retrieve the results.

Example using curl:

curl --request GET \ 
     --url https://api.behavioralsignals.com/v5/detection/clients/<your-cid>/processes/<process-id>/results \
     --header 'X-Auth-Token: <your-api-token>' \
     --header 'accept: application/json'

Response:

{
  "pid": <your-pid>,
  "cid": <your-cid>,
  "code": 2,
  "message": "Processing Complete",
  "results": [
    {
      "id": "0",
      "startTime": "0.031",
      "endTime": "13.151",
      "task": "asr",
      "prediction": [
        {
          "label": " This is deep fake example of what is possible with powerful computer and editing. It took around seventy two hours to create this example from scratch using extremely powerful GPU. It could improve with more computing time, but ninety percent people cannot tell the difference",
          "posterior": null,
          "dominantInSegments": []
        }
      ],
      "finalLabel": " This is deep fake example of what is possible with powerful computer and editing. It took around seventy two hours to create this example from scratch using extremely powerful GPU. It could improve with more computing time, but ninety percent people cannot tell the difference",
      "level": "utterance",
      "embedding": null
    },
    {
      "id": "0",
      "startTime": "0.031",
      "endTime": "13.151",
      "task": "diarization",
      "prediction": [
        {
          "label": "SPEAKER_00",
          "posterior": null,
          "dominantInSegments": []
        }
      ],
      "finalLabel": "SPEAKER_00",
      "level": "utterance",
      "embedding": null
    },
    {
      "id": "0",
      "startTime": "0.031",
      "endTime": "13.151",
      "task": "language",
      "prediction": [
        {
          "label": "en",
          "posterior": "0.9990234375",
          "dominantInSegments": []
        },
        {
          "label": "es",
          "posterior": "7.838010787963867e-05",
          "dominantInSegments": []
        },
        {
          "label": "pt",
          "posterior": "7.361173629760742e-05",
          "dominantInSegments": []
        }
      ],
      "finalLabel": "en",
      "level": "utterance",
      "embedding": null
    },
    {
      "id": "0",
      "startTime": "0.031",
      "endTime": "13.151",
      "task": "deepfake",
      "prediction": [
        {
          "label": "spoofed",
          "posterior": "0.9997",
          "dominantInSegments": []
        },
        {
          "label": "bonafide",
          "posterior": "0.0003",
          "dominantInSegments": []
        }
      ],
      "finalLabel": "spoofed",
      "level": "utterance",
      "embedding": null
    }
  ]
}

The results object is a list of predictions - corresponding to each utterance detected in the audio file. The startTime and endTime mark the start and end of the utterance.
Each result item contains a set of tasks: asr, diarization, language and deepfake
The "deepfake" task consists of two classes: "bonafide" (authentic) and "spoofed" (deepfake).
The posterior denotes the probability of the corresponding class.
The finalLabel field always displays the dominant class.

🐍 Using the Python SDK

Submit the recording

You can submit a recording using the following method:

from behavioralsignals import Client

client = Client(YOUR_CID, YOUR_API_KEY)

response = client.deepfakes.upload_audio(file_path="audio.wav")

The response is an object of class ProcessItem, defined here

Retrieve the results

Retrieve the results of a given process with:

output = client.deepfakes.get_result(pid=<pid-of-process>)

The output is an object of class ResultResponse, defined here