Prerequisites

Create a panel

You can use the panel miniapp to develop an AI audio transcription and summary device panel based on the Ray framework.

Required conditions

For more information, see Panel MiniApp > Set up environment.

A product defines the data points (DPs) of the associated panel and device. Before you develop a panel, you must create a product, define the required DPs, and then implement these DPs on the panel.

Register and log in to the Tuya Developer Platform and create a product.

  1. In the left-side navigation pane, choose Product > Development > Create.
  2. In the Standard Category tab, select the desired product category and then select the product on the right. Select a smart mode and solution, and complete product information.
  3. Click Create.
  4. On the page of Add Standard Function, choose your desired DPs or use the default settings, and click OK.
  5. On the product details page, copy the product ID (PID), and contact your project manager to configure and enable the Product AI Capabilities for this PID.
  6. After confirming this feature has been enabled, you can find the Product AI Capabilities tab in Function Definition. In the section of Selected AI capabilities, you can see the AI ​​capabilities used by the current product. You can also click Add Product AI Capabilities to add other AI features.
  7. Under Selected AI agents > AI Agents on Panels, click Select AI Agent.
  8. On the Add AI Agents on Panels page, select Method 2: Create an AI agent based on a template to implement the agent.
  9. In AI Transcribe, check Create Agent from Template and click OK.
  10. In the section of Selected AI agents, click View Details.
  11. Copy the agent ID and save it for later use.

Create panel miniapp on Smart MiniApp Developer Platform

Register and log in to the Smart MiniApp Developer Platform. For more information, see Create panel miniapp.

Create a project based on a template

Open Tuya MiniApp IDE and create a panel miniapp project based on the AI audio transcription and summary template. For more information, see Initialize project.

In the project, configure the agent ID you copied in the previous Create a product > Step 11. The template code is src/pages/Home/index.

// The agent ID. Change it to the agent ID configured for the product.
const AGENT_ID = "xxx";

Additionally, you need to pre-configure the language code parameter for audio or implement the interaction for language code selection. Refer to the template code at src/pages/Home/index. The language code format should be like en-US, zh-CN, fr-FR, and ja-JP.

// The callback to invoke when the transcription task is started
startAIAudioTranscriptionTask({
  devId,
  businessCode: AGENT_ID,
  key: storageKey.current,
  language: "zh-CN", // The language of the recorded audio.
  duration: Math.floor(duration / 1000),
  template: "default", // Valid value: default. It cannot be changed currently.
});

You have now completed the basic configuration of the panel.

You can call the API InnerAudioContext for recording.


// The audio format: mp3.
const AUDIO_FILE_SUFFIX = "mp3";
const recorderManagerRef = useRef(null);
const [isRecording, setIsRecording] = useState(false);
const [tempAudioFilePath, setTempAudioFilePath] = useState("");
const [duration, setDuration] = useState(0); // The duration of the recording file, in milliseconds.
const tempPathRef = useRef(null); // The temporary path of the recording file.

useEffect(() => {
  recorderManagerRef.current = ty.getRecorderManager();
}, []);

// Start recording.
const handleStart = () => {
  try {
    recorderManagerRef.current.start({
      frameSize: undefined,
      sampleRate: 16000,
      numberOfChannels: 1,
      format: AUDIO_FILE_SUFFIX,
      success: (d) => {
        tempPathRef.current = d.tempFilePath;
        setIsRecording(true);
      },
      fail: (err) => {},
    });
  } catch (error) {}
};

// Stop recording.
const handleStop = () => {
  try {
    recorderManagerRef.current.stop({
      success: (d) => {
        setTimeout(() => {
          if (tempPathRef.current) {
            setTempAudioFilePath(tempPathRef.current);
          }
          ty.getAudioFileDuration({
            path: tempPathRef.current,
            success: (res) => {
              if (res.duration) {
                setDuration(res.duration);
              }
            },
            fail: (err) => {},
          });
          setIsRecording(false);
        }, 1000);
      },
    });
  } catch (error) {}
};

After the recording is finished, get the temporary file path of the recording file, upload the file, and then initiate transcription.

import {
  getAIAudioTranscriptionStorageConfig,
  startAIAudioTranscriptionTask,
} from "@ray-js/ray";

// The audio format: mp3.
const AUDIO_FILE_SUFFIX = "mp3";
// 0: Not started.
// 1: Uploading.
// 2: Uploaded successfully.
// 3: Transcribing.
// 4: Transcription is completed.
const [transferStatus, setTransferStatus] = useState(0);
// The key of the audio file.
const storageKey = useRef();

const handleStartTransfer = async () => {
  try {
    ty.showLoading({ title: "" });
    setTransferStatus(1);
    // Request an authorization token for cloud storage.
    const storageConfig = await getAIAudioTranscriptionStorageConfig({
      devId,
      name: `audio_${Math.round(Math.random() * 100)}_${new Date().getTime()}`,
      businessCode: AGENT_ID,
      suffix: AUDIO_FILE_SUFFIX,
    });
    const { headers, key, url } = storageConfig as any;
    storageKey.current = key;
    // Upload the file.
    const task = ty.uploadFile({
      url,
      filePath: tempAudioFilePath,
      name: key,
      header: headers,
      success: (res) => {
        setTransferStatus(2);
        startAIAudioTranscriptionTask({
          devId,
          businessCode: AGENT_ID,
          key: storageKey.current,
          language: "zh-CN", // The language of the recorded audio.
          duration: Math.floor(duration / 1000),
          template: "default",
        })
          .then(() => {
            setTransferStatus(3);
          })
          .catch((e) => {
            ty.showToast({ title: "File transcription failed" });
          });
      },
      fail: (err) => {
        console.log("Upload failed", err);
        ty.showToast({ title: "File upload failed" });
        setTransferStatus(0);
      },
    });
    task.onProgressUpdate((res) => {
      console.log("Upload progress", res.progress);
    });
    task.onHeadersReceived((res) => {
      console.log("Upload headers", res);
    });
    ty.hideLoading();
  } catch (error) {
    ty.hideLoading();
  }
};

After initiating the transcription task and receiving a successful response from the interface, you need to poll the transcription status interface to check the task's progress.

import { getAIAudioTranscriptionStatus } from "@ray-js/ray";

const intervalId = useRef(null);
const getTransferProcessStatus = useCallback(async () => {
  try {
    // Poll the uploaded recording files that are being transcribed.
    const transferStatusList = await getAIAudioTranscriptionStatus({
      devId,
      keys: storageKey.current,
      businessCode: AGENT_ID,
    });
    // 1: Uploaded.
    // 2: Being transcribed.
    // 9: Completed.
    // 100: An error occurred.
    if (transferStatusList?.[0].status === 9) {
      clearInterval(intervalId.current);
      setTransferStatus(4);
    } else if (transferStatusList?.[0].status === 100) {
      clearInterval(intervalId.current);
      setTransferStatus(0);
      ty.showToast({ title: "Transcription failed code:100" });
    }
  } catch (error) {
    console.log(error);
    clearInterval(intervalId.current);
  }
}, []);

When the interface of polling transcription status returns status 9, it means that the transcription and summary are completed. At this time, you can call the interface to query the transcription and summary results respectively.

import { getAIAudioTranscriptionSttText, getAIAudioTranscriptionSummary } from "@ray-js/ray";

getAIAudioTranscriptionSttText({
  devId,
  key: storageKey.current,
  businessCode: AGENT_ID,
}).then((d: any) => {
  if (d?.length) {
    setSttList(d);
  }
});
getAIAudioTranscriptionSummary({
  devId,
  key: storageKey.current,
  businessCode: AGENT_ID,
}).then((d: any) => {
  if (d?.summary) {
    setSummary(d?.summary);
  }
});