Last Updated on : 2025-08-13 08:54:00download
Before integrating the AI stream, read and implement the Preparation.
After integration is completed, add the dependency to your Podfile
:
target 'xxxxx' do
# Note: ThingSmartHomeKit requires v6.7.0 or later.
# ...
# Import the AI stream SDK
pod 'ThingSmartBusinessExtensionKitAIStreamExtra'
end
To enable the AI agent in your app, perform the following steps:
Create an AI agent. For more information, see AI Agent Dev Platform.
Publish and deploy the agent to your app. After configuration, click Publish at the top of the page.
Record the agent ID for later use in SDK session creation.
On the Deploy Agents page, click Deploy to Apps to assign the agent to your target app. Confirm and then click Confirm Info and Publish. Record the MiniApp ID for later use in SDK session creation.
The current version is in developer preview. Provide the recorded Agent ID and MiniApp ID to the Tuya Developer Team to activate the correct configuration for your app.
The channel currently supports two connection types: app and proxy device.
Sessions are established on top of channels. A single channel can host multiple sessions, which are required for data stream transmission.
When using different AI agents or configurations, you must request an agent token using the relevant information and create a session.
Before transmitting data streams, make sure a session has been created.
Key concepts:
Event
DataChannel
ReuseDataChannel
StreamFlag
Event
An event
can be sent from the app to the server, or delivered from the server to the app.
When sending, it mainly includes the following events:
Event | Description |
---|---|
EventStart |
Marks the start of an event. It must be sent before data stream transmission. An EventId is required, which must remain consistent until EventEnd or ChatBreak is sent. |
EventPayloadEnd |
Indicates completion of a single data transmission within an event. DataChannel data is required. |
EventEnd |
Marks the end of an event. This event is sent after all data streams are transmitted. After receiving this event, the server will deliver all data to the agent for processing. |
ChatBreak |
Interrupts an event. It can be triggered during data stream transmission or reception. |
When receiving, it mainly includes the following events:
Event | Description |
---|---|
EventStart |
Marks the start of an event, indicating the cloud is about to deliver data to the app. Includes an EventId corresponding to the one sent by the app. |
EventPayloadEnd |
Indicates completion of a single data transmission within an event. |
EventEnd |
Marks the end of an event. This event is sent after all data streams are transmitted. |
ChatBreak |
Interrupts an event. The cloud is interrupting the app, and the app should stop sending or receiving data for this EventId (requires configuration). |
ServerVAD |
Cloud-based voice activity detection (VAD). The cloud has detected the end of an audio segment from the app. Upon receiving this event, the app must stop sending audio data for this EventId . Even if the data is sent, the cloud will ignore it. Requires configuration. |
The channel supports multimodal data transmission, allowing delivery of multiple data types within a single event cycle. Refer to the agent’s configuration for specific compatible data types.
DataChannel
During session creation, the AgentToken
response includes bizConfig
with two key properties: sendData
and revData
to indicate the data types.
Based on bizConfig.sendData
and bizConfig.revData
, the SDK automatically generates sendDataChannels
and recvDataChannels
.
sendAudioData
and sendVideoData
require no explicit data channel.
dataChannel
value must be passed in the data stream sending method. This scenario is not currently available.
sendEventPayloadsEnd
always requires dataChannel
, regardless of stream count.
ReuseDataChannel
The session creation interface includes a reuseDataChannel
parameter, which defaults to false
(disabled).
It is designed for real-time streaming scenarios (such as live audio/video streams), where two or more sessions require different processing of the same real-time data. It optimizes data timeliness and reduces transmission overhead.
StreamFlag
Mark the start/end of specific data streams, with four types:
OnlyOne(0)
: Single packet only. It is used for text or small-sized images.StreamStart(1)
: Start of data stream to send or receive. The first packet should include metadata, such as sample rate, bit depth, and format for audio data. Packet data can be empty.Streaming(2)
: Data transmission is in progress.StreamEnd(3)
: End of data stream. Packet data can be empty.#import <ThingSmartStreamChannelKit/ThingSmartStreamChannelKit.h>
// Use the device identity
self.client = [ThingSmartStreamClient clientForAgentDevice:@"device_id"];
// Use the app identity
self.client = [ThingSmartStreamClient clientForApp];
@interface ThingSmartStreamClient : NSObject
/// Adds an Stream client delegate.
/// - Parameter delegate: Delegate
- (void)addDelegate:(id<ThingSmartStreamClientDelegate>)delegate;
/// Removes an Stream client delegate.
/// - Parameter delegate: Delegate
- (void)removeDelegate:(id<ThingSmartStreamClientDelegate>)delegate;
@end
Example:
Before establishing a connection or receiving data, you must set up a delegate to receive messages.
- (void)addListner {
// self.client = [ThingSmartStreamClient clientForApp];
// self.client = [ThingSmartStreamClient clientForAgentDevice:@"xxx"];
[self.client addDelegate:self];
}
@protocol ThingSmartStreamClientDelegate <NSObject>
@optional
/// Connection connect state changed
/// - Parameters:
/// - client: stream client object
/// - connectState: connect state
/// - error: error info. when connectState == ThingSmartStreamConnectStateClosedByServer
- (void)streamClient:(ThingSmartStreamClient *)client
connectState:(ThingSmartStreamConnectState)connectState
error:(nullable NSError *)error;
/// Session state changed.
/// - Parameters:
/// - sessionID: session id
/// - sessionState: session state
/// - error: error info. when connectState != ThingSmartStreamSessionStateCreateSuccess
- (void)streamClientSessionId:(NSString *)sessionID
changedSessionState:(ThingSmartStreamSessionState)sessionState
error:(nullable NSError *)error;
@end
@protocol ThingSmartStreamClientDelegate <NSObject>
@optional
- (void)streamClientDidReceiveEvent:(ThingStreamEventPacketModel *)packet;
- (void)streamClientDidReceiveVideo:(ThingStreamVideoPacketModel *)packet;
- (void)streamClientDidReceiveAudio:(ThingStreamAudioPacketModel *)packet;
- (void)streamClientDidReceiveImage:(ThingStreamImagePacketModel *)packet;
- (void)streamClientDidReceiveFile:(ThingStreamFilePacketModel *)packet;
- (void)streamClientDidReceiveText:(ThingStreamTextPacketModel *)packet;
@end
How it works:
@interface ThingSmartStreamClient : NSObject
@property (nonatomic, assign, readonly) ThingSmartStreamClientType clientType;
/// The stream client for AgentDevice.
/// - Parameter devId: Device ID.
+ (nullable instancetype)clientForAgentDevice:(NSString *)devId;
/// The stream client for App
+ (instancetype)clientForApp;
@end
@interface ThingSmartStreamClient : NSObject
/// Connect to stream server.
- (void)connect;
/// - Parameter success: callback connection ID when connection established.
- (void)connect:(ThingSuccessString)success;
@end
For device proxy mode (clientType == ThingSmartStreamClientTypeDevice
), it is recommended to destroy the channel after exiting the device control page.
@interface ThingSmartStreamClient : NSObject
/// Disconnect from stream server.
- (void)disconnect;
/// Destory the stream client. ( Only for AgentDevice client. )
- (void)destory;
@end
typedef NS_ENUM(NSUInteger, ThingSmartStreamConnectState) {
ThingSmartStreamConnectStateIdle = 0,
ThingSmartStreamConnectStateConnecting = 1,
ThingSmartStreamConnectStateAuthing = 2,
ThingSmartStreamConnectStateConnected = 3,
ThingSmartStreamConnectStateClosedByServer = 4,
ThingSmartStreamConnectStateClosed = 5
};
@interface ThingSmartStreamClient : NSObject
@property (atomic, assign, readonly) ThingSmartStreamConnectState state;
/// Check the connection is connected;
- (BOOL)isConnected;
@end
A single connection can maintain up to 20 concurrent sessions. Properly create or close sessions as needed.
There are two session creation methods.
Method 1: Normal method. First, get the agent token info
and use it to create a session.
Since getting the agent token info
involves some latency, this method allows pre-fetching. Typically, it is valid for 24 hours.
@interface ThingSmartStreamClient : NSObject
#pragma mark - Session Establish & Close
/// Query AI agent token.
/// - Parameters:
/// - params: Query params obj.
/// - success: Success handler.
/// - failure: Fail handler.
- (void)queryAgentToken:(ThingStreamQueryAgentTokenParams *)params
success:(void(^)(ThingStreamAgentTokenInfo *tokenResponse))success
failure:(ThingFailureError)failure;
/// Creat a new session.
/// - Parameters:
/// - tokenInfo: Agent token info. (use -queryAgentToken:success:failure: to get token info)
/// - bizTag: Biz tag.
/// - reuseDataChannel: Reuse data channel. (Default is NO)
/// - sessionId: Session ID. If sessionId is nil, a new session will be created.
/// - cacheBasePath: Cache receive data path.
/// - userDatas: User data list.
/// - completion: Completion.
- (void)createSessionWithToken:(ThingStreamAgentTokenInfo *)token
bizTag:(uint64_t)bizTag
reuseDataChannel:(BOOL)reuseDataChannel
sessionId:(nullable NSString *)sessionId
cacheBasePath:(nullable NSString *)cacheBasePath
userDatas:(nullable NSArray<ThingStreamAttribute *> *)userDatas
completion:(nullable ThingSmartStreamSessionCompletion)completion;
@end
Method 2: Convenient method. Get the agent token info
and automatically create a session.
@interface ThingSmartStreamClient : NSObject
/// Query AI agent token and create a new session.
/// - Parameters:
/// - params: Query params obj.
/// - reuseDataChannel: reuse data channel. (Default is NO)
/// - cacheBasePath: Cache receive data path. If cacheBasePath is nil, it won't cache data.
/// - userDatas: User data list.
/// - completion: Completion.
- (void)createSessionWithQueryParams:(ThingStreamQueryAgentTokenParams *)params
reuseDataChannel:(BOOL)reuseDataChannel
cacheBasePath:(nullable NSString *)cacheBasePath
userDatas:(nullable NSArray<ThingStreamAttribute *> *)userDatas
completion:(nullable ThingSmartStreamSessionCompletion)completion;
@end
A single connection can maintain up to 20 concurrent sessions. Close unnecessary sessions when appropriate.
@interface ThingSmartStreamClient : NSObject
/// Close a session
/// - Parameters:
/// - sessionID: session id.
/// - code: close reason.
/// - completion: Completion.
- (void)closeSession:(NSString *)sessionID
withCode:(ThingStreamAttributeConnectionSessionCode)code
completion:(nullable ThingSmartStreamCompletion)completion;
@end
To create a session, you must pass the APIs and relevant parameters. Here is an example of invoking the APIs, as well as a parameter description.
aiSolutionCode
: Corresponds to the Agent ID
in the Agent preparation process.miniProgramId
: Corresponds to the Mini Program ID
in the Agent preparation process.- (void)createSession {
// First, select an identity and ensure that the channel is connected
// self.clent = [ThingSmartStreamClient clientForAgentDevice:@"your_device_id"];
// self.client = [ThingSmartStreamClient clientForApp];
ThingStreamQueryAgentTokenParams *params = [ThingStreamQueryAgentTokenParams new];
params.solutionCode = @"your_agent_id"; // agent id
params.ownerId = @"your_home_id"; // home id
params.api = @"m.life.ai.token.get";
params.apiVerion = @"1.0";
params.extParams = @{
@"miniProgramId": @"your_mini_app_id",
@"onlyAsr": @NO, // only ASR or not
@"needTts": @NO // need TTS or not
};
WEAKSELF_ThingSDK
// Normal: Request the agent token first and then create the session.
[self.client queryAgentToken:params success:^(ThingStreamAgentTokenInfo * _Nonnull tokenResponse) {
[weakSelf_ThingSDK.client createSessionWithToken:tokenResponse bizTag:0 reuseDataChannel:NO sessionId:nil cacheBasePath:nil userDatas:nil completion:^(ThingStreamSessionInfo * _Nullable result, NSError * _Nullable error) {
if (result) {
NSLog(@"Creat success: %@", result.sessionID);
} else {
NSLog(@"Create fail: %@", error);
}
}];
} failure:^(NSError *error) {
NSLog(@"Query token fail: %@", error);
}];
// // Convenient: Request the agent token to create a session in one function
// [self.client createSessionWithQueryParams:params reuseDataChannel:NO cacheBasePath:nil userDatas:nil completion:^(ThingStreamSessionInfo * _Nullable result, NSError * _Nullable error) {
// if (result) {
// NSLog(@"Creat success: %@", result.sessionID);
// } else {
// NSLog(@"Create fail: %@", error);
// }
// }];
}
The following events can be sent:
EventStart
: Marks the start of a dialogue round. You can either specify an EventId
or let the SDK auto-generate one.EventPayloadEnd
: Notifies the cloud that one data stream transmission in this dialogue is completed.EventEnd
: Ends the dialogue round. After this event is sent, the AI will process all transmitted data and generate a response.ChatBreak
: Interrupts a dialogue. This event can be triggered during sending or receiving.@interface ThingSmartStreamClient : NSObject
#pragma mark - Send Event Packet
/// Send event start.
/// - Parameters:
/// - sessionId: Session id.
/// - attributes: Attributes of Event Packet.
/// - Optional attributes:
/// - UserData
/// - successHandler: Success handler.
/// - failure: Failure handler.
- (void)sendEventStart:(NSString *)sessionId
userDatas:(nullable NSArray<ThingStreamAttribute *> *)userDatas
success:(nullable void(^)(NSString *eventId))successHandler
failure:(nullable ThingFailureError)failure;
/// Send event start.
/// - Parameters:
/// - sessionId: Session id.
/// - eventId: Event id.
/// - attributes: Attributes of Event Packet.
/// - Optional attributes:
/// - UserData
/// - successHandler: Success handler.
/// - failure: Failure handler.
- (void)sendEventStart:(NSString *)sessionId
eventId:(nullable NSString *)eventId
userDatas:(nullable NSArray<ThingStreamAttribute *> *)userDatas
success:(nullable void(^)(NSString *eventId))successHandler
failure:(nullable ThingFailureError)failure;
/// Send event payloads end.
/// - Parameters:
/// - eventId: Event ID. Same as the value passed in to -startEventStart:sessionId:attributes:completion:
/// - sessionId: Session ID.
/// - dataChannel: The dataChannel used inVideo/Audio/Image/File/Text Packet that transmission payload ends.
/// - attributes: Attributes of Event Packet.
/// - Optional attributes:
/// - UserData
/// - completion: Completion.
- (void)sendEventPayloadsEnd:(NSString *)eventId
sessionId:(NSString *)sessionId
dataChannel:(NSString *)dataChannel
userDatas:(nullable NSArray<ThingStreamAttribute *> *)userDatas
completion:(nullable ThingSmartStreamCompletion)completion;
/// Send event end.
/// - Parameters:
/// - eventId: Event ID. Same as the value passed in to -startEventStart:sessionId:attributes:completion:
/// - sessionId: Session ID.
/// - attributes: Attributes of Event Packet.
/// - Optional attributes:
/// - UserData
/// - EventTimestamp
/// - completion: Completion.
- (void)sendEventEnd:(NSString *)eventId
sessionId:(NSString *)sessionId
userDatas:(nullable NSArray<ThingStreamAttribute *> *)userDatas
completion:(nullable ThingSmartStreamCompletion)completion ;
/// Send event chat break.
/// - Parameters:
/// - eventId: Event ID. Same as the value passed in to -startEventStart:sessionId:attributes:completion:
/// - sessionId: Session ID.
/// - completion: Completion.
- (void)sendEventChatBreak:(NSString *)eventId
sessionId:(NSString *)sessionId
completion:(nullable ThingSmartStreamCompletion)completion;
@end
@interface ThingSmartStreamClient : NSObject
#pragma mark - Packet
#pragma mark Send Video/Audio/Image/Text/File Packet
/// Send video packet.
/// - Parameters:
/// - videoModel: Video packet model.
/// - completion: completion.
- (void)sendVideoData:(ThingStreamVideoPacketModel *)packet completion:(nullable ThingSmartStreamCompletion)completion;
/// Send audio packet.
/// - Parameters:
/// - audioModel: Audio packet model.
/// - completion: completion.
- (void)sendAudioData:(ThingStreamAudioPacketModel *)packet completion:(nullable ThingSmartStreamCompletion)completion;
/// Send text packet.
/// - Parameters:
/// - textModel: text packet model.
/// - completion: completion.
- (void)sendTextData:(ThingStreamTextPacketModel *)packet completion:(nullable ThingSmartStreamCompletion)completion;
/// Send image packet.
/// - Parameters:
/// - imageModel: image packet model.
/// - completion: completion.
- (void)sendImageData:(ThingStreamImagePacketModel *)packet
progress:(nullable void(^)(int progress))progress
completion:(nullable ThingSmartStreamCompletion)completion;
/// Send file packet. (Currently, the server model is not supported.)
/// - Parameters:
/// - fileModel: file packet model.
/// - completion: completion.
- (void)sendFileData:(ThingStreamFilePacketModel *)packet
progress:(nullable void(^)(int progress))progress
completion:(nullable ThingSmartStreamCompletion)completion;
@end
For real-time audio recording and playback of received audio files, two utility classes are provided, ThingStreamRecorder.h
and ThingStreamPlayer.h
. Import them by adding #import <ThingSmartStreamBizKit/ThingSmartStreamBizKit.h>
.
For more information, see the demo tuya-home-ios-sdk-sample-swift.
Procedure:
StreamChatController.swift
file.func createSession()
:
solutionCode
to the Agent ID
recorded in prerequisites.miniProgramId
to the Mini Program ID
recorded in prerequisites.func createSession()
, choose a session creation approach (Normal
or Convenient
) and uncomment the relevant code.code | msg | Description | Remarks |
---|---|---|---|
39001 | / | A common error has occurred. | Applies to miscellaneous error cases. |
39002 |
|
Invalid parameter. |
|
39003 | Cloud error messages | HTTP request failed. | Failed to get the agent token. The request for an agent token from the cloud interface failed. |
39004 | not connect | Connection is not established. | This error occurs when creating/closing a session, sending data, or sending an event. |
39005 | session is invalid | The SessionId does not exist. | This error occurs when closing a session, sending data, or sending an event. |
39006 | eventId is invalid | The EventId is empty. | This error occurs when sending an event. |
39007 | dataChannel is invalid | Invalid dataChannel. | This error occurs when sending data and PayloadEnd . |
39008 | packet is invalid | Invalid data packet. | This error occurs when sending data. For example, the first or only one packet of data requires fixed parameters, and the text content is empty. |
39009 |
|
An exception occurred while reading file data. | This error occurs when sending data. For example, the image or file does not exist, or an exception occurred during the reading process. TTT: The file required to play audio does not exist (startPlayAudio ). |
39010 |
|
Failed to send the data. | Failed to send data via socket, containing socket error details. |
39012 |
|
The connection was closed by the remote end. | Detailed errors correspond to cloud error codes: 200, 400, 401, 404, 408, 500, 504, 601, 602, 603, 604, and 605. For TTT, 39012 does not exist. TTT devices report errors directly through onConnectStateChanged with codes such as 200 and 400. |
This section details the JSON formats and their processing logic in the AI chat component.
After a user sends a voice input, the system returns the ASR result in the following format:
{
"bizId": "asr-1754380053514",
"bizType": "ASR",
"eof": 0,
"data": {
"text": "what's the weather today?"
}
}
Field | Description | Field type | Example |
---|---|---|---|
bizId | The business ID used to identify an interaction. | string | asr-1754380053514 |
bizType | The business type. ASR indicates speech recognition results. | string | ASR |
eof | The end flag. Valid values: 0 means not ended, and 1 means ended. |
int | 0 or 1 |
data | The returned ASR result object. | object | |
text | The recognized text content. | string | What’s the weather today? |
Treatment:
eof
is 0
, save the temporary result and display the intermediate recognition result.eof
is 1
, the final recognition result is displayed and added to the chat list.After the AI model responds to a user query, the data returned is in the following format:
{
"bizId": "nlg-1754380053514",
"bizType": "NLG",
"eof": 0,
"data": {
"appendMode": "append",
"reasoningContent": "Reasoning content",
"content": "Text content returned by the model",
"images": [
{
"url": "https://www.tuya.com/image1.jpg"
}
]
}
}
Field | Description | Field type | Example |
---|---|---|---|
bizId | The business ID, used to associate messages from the same interaction. | string | nlg-1754380053514 |
bizType | The business type. NLG stands for Natural Language Generation. | string | NLG |
eof | The end flag. Valid values: 0 means streaming is in progress, and 1 means it is finished. |
int | 0 or 1 |
data | The data object returned. | object | |
appendMode | The text append mode. | string | Append or others |
reasoningContent | The LLM reasoning process (chain-of-thought). | string | The step-by-step thinking content |
content | The actual response content generated by the LLM. | string | Text response from the LLM |
images | The image data array returned by the LLM. | array | |
url | The URL of the specified image. | string | https://www.tuya.com/im***1.jpg |
Treatment:
appendMode
is append
, the new content is appended to the existing message. Otherwise, a new message is created.url
from the images
array, and render each URL as a rich-media image message.eof
is 1
, it marks the end of NLG streaming.AI can invoke various skills. For example, display emojis.
This section only illustrates emoji-type skills. Other skills might vary depending on the data
definition of the agent.
{
"bizId": "skill-1754380053514",
"bizType": "SKILL",
"eof": 0,
"data": {
"code": "llm_emo",
"skillContent": {
"text": "😀",
"startTime": 1000,
"endTime": 2000,
"sequence": 1
}
}
}
Field | Description | Field type | Example |
---|---|---|---|
bizId | The business ID. | string | skill-1754380053514 |
bizType | The business type. SKILL means to invoke skills. |
string | SKILL |
eof | The end flag. | int | 0 or 1 |
data | The returned data. | object | - |
code | The code of a skill. | string | llm_emo |
skillContent | The content of a skill. | object | - |
text | The emoji. | string | 😀 |
startTime | The start time to display an emoji, in milliseconds. | long | 1000 |
endTime | The end time to stop displaying an emoji, in milliseconds. | long | 2000 |
sequence | The sequence number, starting with 1 . |
int | 1 |
Treatment:
sequence
is 1
, the existing emoji step list is cleared.eof
is 1
, the emojis begin to be displayed in chronological order.The following emoji types are supported:
Emotion type | Unicode | Display effect |
---|---|---|
SAD | \uD83D\uDE22 | 😢 |
ANGRY | \uD83D\uDE20 | 😠 |
NEUTRAL | \uD83D\uDE10 | 😐 |
FEARFUL | \uD83D\uDE28 | 😨 |
SURPRISE | \uD83D\uDE32 | 😲 |
CONFUSED | \uD83D\uDE15 | 😕 |
DISAPPOINTED | \uD83D\uDE1E | 😞 |
ANNOYED | \uD83D\uDE21 | 😡 |
THINKING | \uD83E\uDD14 | 🤔 |
HAPPY | \uD83D\uDE00 | 😀 |
Is this page helpful?
YesFeedbackIs this page helpful?
YesFeedback