Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion samples/CameraAccess/CameraAccess.xcodeproj/project.pbxproj
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
9D3C69602F367CF700E641A5 /* iPhone */ = {isa = PBXFileSystemSynchronizedRootGroup; explicitFileTypes = {}; explicitFolders = (); path = iPhone; sourceTree = "<group>"; };
9D85EB992F35EC46006C44D1 /* OpenClaw */ = {isa = PBXFileSystemSynchronizedRootGroup; explicitFileTypes = {}; explicitFolders = (); name = OpenClaw; path = CameraAccess/OpenClaw; sourceTree = SOURCE_ROOT; };
E699CC962E8150670052C240 /* CameraAccessTests */ = {isa = PBXFileSystemSynchronizedRootGroup; explicitFileTypes = {}; explicitFolders = (); path = CameraAccessTests; sourceTree = "<group>"; };
A1C4D5E62F0B000100000001 /* Utilities */ = {isa = PBXFileSystemSynchronizedRootGroup; explicitFileTypes = {}; explicitFolders = (); path = Utilities; sourceTree = "<group>"; };
/* End PBXFileSystemSynchronizedRootGroup section */

/* Begin PBXFrameworksBuildPhase section */
Expand Down Expand Up @@ -216,6 +217,7 @@
8FD96B792E6F0A9800F56AB1 /* CameraAccessApp.swift */,
8FD96B7B2E6F0A9800F56AB1 /* Info.plist */,
9D85EB992F35EC46006C44D1 /* OpenClaw */,
A1C4D5E62F0B000100000001 /* Utilities */,
);
path = CameraAccess;
sourceTree = "<group>";
Expand Down Expand Up @@ -306,6 +308,7 @@
fileSystemSynchronizedGroups = (
9D3C69602F367CF700E641A5 /* iPhone */,
9D85EB992F35EC46006C44D1 /* OpenClaw */,
A1C4D5E62F0B000100000001 /* Utilities */,
);
name = CameraAccess;
productName = CameraAccess;
Expand Down Expand Up @@ -726,7 +729,7 @@
repositoryURL = "https://github.com/facebook/meta-wearables-dat-ios";
requirement = {
kind = exactVersion;
version = 0.4.0;
version = 0.5.0;
};
};
9DD6CAFC2F3C62DA00ED7098 /* XCRemoteSwiftPackageReference "WebRTC" */ = {
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"images" : [
{
"filename" : "imagine_a_film_camera_in_the_style.jpeg",
"filename" : "AppIcon.png",
"idiom" : "universal",
"platform" : "ios",
"size" : "1024x1024"
Expand Down
Binary file not shown.
18 changes: 18 additions & 0 deletions samples/CameraAccess/CameraAccess/Gemini/GeminiConfig.swift
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,24 @@ enum GeminiConfig {
Never call execute silently -- the user needs verbal confirmation that you heard them and are working on it. The tool may take several seconds to complete, so the acknowledgment lets them know something is happening.

For messages, confirm recipient and content before delegating unless clearly urgent.

You also have a save_photo tool. Use it when the user asks you to capture, save, snap, photograph, or take a picture of what they're looking at. In the description parameter, briefly describe what you see in the frame. This saves the current camera view directly to their iPhone photo library -- it's instant, no network needed.

You have a save_note tool. Use it to record observations, measurements, hazards, or action items as field notes. Always save important findings during inspections or when the worker mentions something worth recording. Categorize notes when appropriate: observation, hazard, measurement, or action_item. The worker may need these notes for their field report later.

You have access to the current job context injected at the start of this session, including the worker's name, job details, site address, and GPS location. Use this context to give relevant, job-aware responses. Address the worker by name. Reference the job and site when relevant.

You have a knowledge_lookup tool. Use it when the user says "look this up", "what is this", "find the specs", or asks about something they're looking at. First READ any visible text from the camera (part numbers, model names, labels, serial numbers), then call knowledge_lookup with a specific search query. Include the manufacturer and model number if visible. Results are automatically saved as reference notes.

You have a generate_report tool. Use it when the user says "generate my field report", "create a report", "compile my findings", "write up my notes", etc. This compiles all session data (job details, notes, photos, GPS, timestamps) into a professional PDF and opens the share sheet so they can immediately AirDrop, email, or save the report. Confirm that the report is being generated before calling the tool.

You have start_inspection and stop_inspection tools for proactive inspection mode. When the user says "start inspection", "begin inspection", "inspect this area", or similar, call start_inspection. If they mention a focus area (e.g. "focus on electrical" or "check for water damage"), include it in the focus parameter. When they say "stop inspection" or "end inspection", call stop_inspection.

During inspection mode, you will receive periodic [INSPECTION] prompts. IMPORTANT: Only respond if you genuinely see something the inspector should know about -- damage, wear, safety hazards, code violations, unusual conditions, or noteworthy changes. If nothing stands out in the current view, stay completely silent. Do NOT acknowledge the inspection prompt or say "everything looks fine". Keep observations brief, specific, and actionable.

You have start_safety_monitor and stop_safety_monitor tools. When the user says "enable safety", "watch for hazards", "start safety monitoring", or similar, call start_safety_monitor. When they say "stop safety" or "disable safety monitoring", call stop_safety_monitor. Safety monitoring runs independently from inspection mode — both can be active simultaneously.

During safety monitoring, you will receive periodic [SAFETY CHECK] prompts. ONLY speak if you see a GENUINE safety hazard — missing PPE, electrical dangers, fall risks, fire hazards, or OSHA violations. If nothing unsafe is visible, stay completely silent. When you DO spot a hazard, be urgent, clear, and specific. Always save hazards as notes with category "hazard".
"""

// User-configurable values (Settings screen overrides, falling back to Secrets.swift)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ class GeminiLiveService: ObservableObject {
private let delegate = WebSocketDelegate()
private var urlSession: URLSession!
private let sendQueue = DispatchQueue(label: "gemini.send", qos: .userInitiated)
var sessionContextString: String?

init() {
let config = URLSessionConfiguration.default
Expand Down Expand Up @@ -189,7 +190,13 @@ class GeminiLiveService: ObservableObject {
],
"systemInstruction": [
"parts": [
["text": GeminiConfig.systemInstruction]
["text": {
var instruction = GeminiConfig.systemInstruction
if let ctx = sessionContextString, !ctx.isEmpty {
instruction += "\n\n" + ctx
}
return instruction
}()]
]
],
"tools": [
Expand Down
147 changes: 147 additions & 0 deletions samples/CameraAccess/CameraAccess/Gemini/GeminiSessionViewModel.swift
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,26 @@ class GeminiSessionViewModel: ObservableObject {
@Published var aiTranscript: String = ""
@Published var toolCallStatus: ToolCallStatus = .idle
@Published var openClawConnectionState: OpenClawConnectionState = .notConfigured
@Published var isInspectionActive: Bool = false
@Published var isSafetyMonitorActive: Bool = false
@Published var sessionContext: SessionContext?
@Published var reportURLToShare: URL?

weak var webrtcVM: WebRTCSessionViewModel?
var frameProvider: (() -> UIImage?)?

private let geminiService = GeminiLiveService()
private let openClawBridge = OpenClawBridge()
private var toolCallRouter: ToolCallRouter?
private let audioManager = AudioManager()
private let eventClient = OpenClawEventClient()
private var lastVideoFrameTime: Date = .distantPast
private var stateObservation: Task<Void, Never>?
private var inspectionTimer: Task<Void, Never>?
private var inspectionFocus: String?
private var safetyTimer: Task<Void, Never>?
private let locationService = LocationService()
@Published var spatialService: SpatialLocalizationService?

var streamingMode: StreamingMode = .glasses

Expand All @@ -31,6 +44,23 @@ class GeminiSessionViewModel: ObservableObject {

isGeminiActive = true

// Initialize session context
let context = SessionContext()
sessionContext = context
locationService.requestPermissionAndStart()
if let coord = locationService.currentCoordinate {
context.coordinates = (lat: coord.latitude, lon: coord.longitude)
context.reverseGeocodedAddress = locationService.currentAddress
}

// Start spatial localization (Multiset VPS if configured, else GPS fallback)
let spatial = SpatialLocalizationService(locationService: locationService)
spatialService = spatial
spatial.start()
context.spatialService = spatial

geminiService.sessionContextString = context.contextString()

// Wire audio callbacks
audioManager.onAudioCaptured = { [weak self] data in
guard let self else { return }
Expand Down Expand Up @@ -64,13 +94,17 @@ class GeminiSessionViewModel: ObservableObject {
Task { @MainActor in
self.userTranscript += text
self.aiTranscript = ""
// Broadcast to WebRTC viewers
self.webrtcVM?.broadcastTranscript(speaker: "User", text: text)
}
}

geminiService.onOutputTranscription = { [weak self] text in
guard let self else { return }
Task { @MainActor in
self.aiTranscript += text
// Broadcast to WebRTC viewers
self.webrtcVM?.broadcastTranscript(speaker: "AI", text: text)
}
}

Expand All @@ -91,6 +125,35 @@ class GeminiSessionViewModel: ObservableObject {
// Wire tool call handling
toolCallRouter = ToolCallRouter(bridge: openClawBridge)

// Wire router handlers
toolCallRouter?.frameProvider = frameProvider
toolCallRouter?.inspectionHandler = { [weak self] action, focus in
guard let self else { return }
if action == "start" {
self.startInspection(focus: focus)
} else {
self.stopInspection()
}
}
toolCallRouter?.safetyHandler = { [weak self] action in
guard let self else { return }
if action == "start" {
self.startSafetyMonitor()
} else {
self.stopSafetyMonitor()
}
}
toolCallRouter?.noteHandler = { [weak self] note, category in
guard let self else { return }
self.sessionContext?.addNote(note, category: category ?? "general")
}
toolCallRouter?.sessionContextProvider = { [weak self] in
return self?.sessionContext
}
toolCallRouter?.reportShareHandler = { [weak self] url in
self?.reportURLToShare = url
}

geminiService.onToolCall = { [weak self] toolCall in
guard let self else { return }
Task { @MainActor in
Expand Down Expand Up @@ -119,6 +182,11 @@ class GeminiSessionViewModel: ObservableObject {
self.isModelSpeaking = self.geminiService.isModelSpeaking
self.toolCallStatus = self.openClawBridge.lastToolCallStatus
self.openClawConnectionState = self.openClawBridge.connectionState
// Update location in context
if let coord = self.locationService.currentCoordinate {
self.sessionContext?.coordinates = (lat: coord.latitude, lon: coord.longitude)
self.sessionContext?.reverseGeocodedAddress = self.locationService.currentAddress
}
}
}

Expand Down Expand Up @@ -174,9 +242,28 @@ class GeminiSessionViewModel: ObservableObject {
}
eventClient.connect()
}

// Auto-start inspection if configured
if SettingsManager.shared.inspectionAutoStart {
startInspection(focus: nil)
}

// Auto-start safety monitor if configured
if SettingsManager.shared.safetyMonitorAutoStart {
startSafetyMonitor()
}

// Enter collaborative mode on WebRTC if active
if let webrtc = webrtcVM, webrtc.isActive {
webrtc.enterCollaborativeMode()
}
}

func stopSession() {
stopInspection()
stopSafetyMonitor()
spatialService?.stop()
spatialService = nil
eventClient.disconnect()
toolCallRouter?.cancelAll()
toolCallRouter = nil
Expand All @@ -190,6 +277,7 @@ class GeminiSessionViewModel: ObservableObject {
userTranscript = ""
aiTranscript = ""
toolCallStatus = .idle
sessionContext = nil
}

func sendVideoFrameIfThrottled(image: UIImage) {
Expand All @@ -201,4 +289,63 @@ class GeminiSessionViewModel: ObservableObject {
geminiService.sendVideoFrame(image: image)
}

// MARK: - Inspection Mode

func startInspection(focus: String?) {
guard !isInspectionActive else { return }
isInspectionActive = true
inspectionFocus = focus
let interval = TimeInterval(SettingsManager.shared.inspectionInterval)
NSLog("[Inspection] Started (interval: %.0fs, focus: %@)", interval, focus ?? "general")

inspectionTimer = Task { [weak self] in
while !Task.isCancelled {
try? await Task.sleep(nanoseconds: UInt64(interval * 1_000_000_000))
guard !Task.isCancelled else { break }
guard let self, self.isGeminiActive, self.connectionState == .ready else { continue }
var prompt = "[INSPECTION] Analyze the current camera view."
if let focus = self.inspectionFocus {
prompt += " Focus area: \(focus)."
}
prompt += " Only speak if you see something noteworthy. If nothing stands out, stay completely silent."
self.geminiService.sendTextMessage(prompt)
}
}
}

func stopInspection() {
guard isInspectionActive else { return }
inspectionTimer?.cancel()
inspectionTimer = nil
isInspectionActive = false
inspectionFocus = nil
NSLog("[Inspection] Stopped")
}

// MARK: - Safety Monitor

func startSafetyMonitor() {
guard !isSafetyMonitorActive else { return }
isSafetyMonitorActive = true
let interval = TimeInterval(SettingsManager.shared.safetyMonitorInterval)
NSLog("[Safety] Monitor started (interval: %.0fs)", interval)

safetyTimer = Task { [weak self] in
while !Task.isCancelled {
try? await Task.sleep(nanoseconds: UInt64(interval * 1_000_000_000))
guard !Task.isCancelled else { break }
guard let self, self.isGeminiActive, self.connectionState == .ready else { continue }
let prompt = "[SAFETY CHECK] Scan the current view for safety hazards. ONLY speak if you see a genuine danger — missing PPE, electrical hazards, fall risks, fire risks, or OSHA violations. If everything looks safe, stay completely silent."
self.geminiService.sendTextMessage(prompt)
}
}
}

func stopSafetyMonitor() {
guard isSafetyMonitorActive else { return }
safetyTimer?.cancel()
safetyTimer = nil
isSafetyMonitorActive = false
NSLog("[Safety] Monitor stopped")
}
}
6 changes: 6 additions & 0 deletions samples/CameraAccess/CameraAccess/Info.plist
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,12 @@
<string>This app uses the microphone to have voice conversations with the AI assistant while streaming from your glasses.</string>
<key>NSPhotoLibraryAddUsageDescription</key>
<string>This app needs access to save photos captured from your glasses.</string>
<key>NSLocationWhenInUseUsageDescription</key>
<string>VisionClaw uses your location to tag field reports with GPS coordinates and auto-fill site addresses for job context.</string>
<key>UIFileSharingEnabled</key>
<true/>
<key>LSSupportsOpeningDocumentsInPlace</key>
<true/>
<key>NSAppTransportSecurity</key>
<dict>
<key>NSAllowsLocalNetworking</key>
Expand Down
Loading