Retrieve results

When a processing request is completed successfully you can proceed in fetching the results. The results are provided in a JSON object format and are described in more detail in this page.

🌐 cURL Example

An example of fetching the results via cURL:

curl --request GET \
     --url https://api.behavioralsignals.com/v5/clients/your-client-id/processes/pid/results \
     --header 'X-Auth-Token: your-api-token' \
     --header 'accept: application/json'

The GET results method requires the client ID (long: {cid}) and the process ID (long: {pid}) to be passed as path parameters.

On invocation, it returns the result of the processing in JSON format. In case the specified cid or pid is not found or the status of the job is not set to completed, a corresponding error response is sent to the user.

🐍 Python SDK Example

from behavioralsignals import Client

client = Client(YOUR_CID, YOUR_API_KEY)

# submit for processing
response = client.behavioral.upload_audio(file_path="audio.wav")

# after a while
response = client.behavioral.get_result(pid=response.pid)

Response schema

The response is a JSON with the following structure:

{
  "pid": 0,
  "cid": 0,
  "code": 0,
  "message": "string",
  "results": [
    {
      "id": "0",
      "startTime": "0.209",
      "endTime": "7.681",
      "task": "<task>",
      "prediction": [
        {
          "label": "<label_1>",
          "posterior": "0.9576",
        },
        {
          "label": "<label_2>",
          "posterior": "0.0377",
        },
        ...
      ],
      "finalLabel": "<label_1>",
      "level": "utterance"
    },
    ...
  ]
}

results is an array, where each element corresponds to a prediction for a specific task and utterance/segment. The available tasks are the following:

diarization: Contains the speaker label of the utterance, e.g: SPEAKER_00, SPEAKER_01, .... If the embeddings query param is defined, the speaker embeddings are also returned.
asr: Contains the verbal content of the utterance
gender: The sex of the speaker
age: The age estimation of the speaker
language: The detected language
emotion: The detected emotion. Class labels: happy, angry,neutral,sad
strength: The detected arousal of speech: Class labels: weak, neutral, strong
positivity: The sentiment of speech. Class labels: negative, neutral, positive
speaking_rate: How fast/slow the speaker talks. Class labels: slow, normal, fast
hesitation: Whether there are signs of hesitation in speech. Class labels: no, yes
engagement: Whether the speaker expresses engagement in their tone: Class labels: engaged, neutral, withdrawn
features: This task is only present when the embeddings query param is defined. It contains the behavioral embeddings of the speaker.

The id of each result is used to indicate the utterance/segment id. The startTime, endTime indicate the start/end of the utterance/segment.

Each result has a prediction array. This includes the values of each class for the specific task. For example in case of the emotion task, an example prediction object would be:

"prediction": [
  {
    "label": "sad",
    "posterior": "0.7969"
  },
  {
    "label": "neutral",
    "posterior": "0.1931"
  },
  {
    "label": "happy",
    "posterior": "0.007"
  },
  {
    "label": "angry",
    "posterior": "0.0029"
  }
]

The posterior indicates the probability of this class label being present in the utterance/segment. In case of utterances. In our example, in the first three segments of the utterance, the speaker was sad.

The finalLabel in the result object indicates the dominant class label. Note that, at times, the finalLabel may be different from the label with the largest posterior, because the finalLabel is obtained after applying optimal classification thresholds.

The level field indicates whether this result corresponds to a "segment" or an "utterance". This field will always be "utterance" for this endpoint - the distinction between "segment" and "utterance" is more relevant for our streaming API.

The embedding field contains the speaker or behavioral embedding. This field is empty in most tasks except two:

diarization: Here the embedding field corresponds to the speaker embedding
features: The embedding field corresponds to the behavioral features

This field is present only when the embeddings query param is present.

📘
Notes

Due to model limitations, behavioral results (emotion, strength, etc.) are available in every utterance longer than 1 sec. For shorter utterances, only diarization, asr, gender and language are available.