VFTCam is a browser-based Progressive Web App that creates 360° panoramic photospheres using structured camera capture and GPU-accelerated stitching. Built as a spiritual successor to Google's discontinued Street View Camera app, it runs entirely in the browser with zero external dependencies and no build tools. This technical deep dive explores the architecture, algorithms, and engineering decisions behind creating a production-ready photosphere capture tool for virtual field trips.
The Challenge
When Google discontinued their Street View Camera app in 2023, educators lost a valuable tool for creating immersive virtual field trips. A replacement was needed that could:
- Run entirely in mobile browsers without native app installation, bypassing app store gatekeeping
- Adopt a privacy-first approach, with no user data collection whatsoever
- Work offline in remote field locations
- Handle the memory constraints of mobile devices (especially iOS Safari's ~1.4GB limit)
- Provide real-time guidance for capturing aligned photos
- Stitch 36 high-resolution images into equirectangular panoramas
- Be maintainable without complex build toolchains
Architecture Overview
VFTCam is intentionally built as a "zero-build" application using pure ES6 modules. This decision prioritizes maintainability and debuggability over bundle size optimization. The entire codebase can be edited and tested with just a text editor and Python's built-in HTTP server.
Core Technologies
- ES6 Modules (no bundler)
- WebGL2 for GPU processing
- Three.js for 3D visualization
- IndexedDB for image storage
Device APIs
- WebRTC getUserMedia
- DeviceOrientationEvent
- DeviceMotionEvent
- Geolocation API
PWA Features
- Service Worker (Workbox)
- Web App Manifest
- OPFS for panoramas
- Web Share API
The Capture Pattern
The heart of VFTCam is its structured 36-point capture pattern, arranged in three rows:
Upper Row (pitch +45°): 12 points at 30° yaw intervals Equator (pitch 0°): 12 points at 30° yaw intervals Lower Row (pitch -45°): 12 points at 30° yaw intervals Total Coverage: 36 overlapping images covering full sphere
Each capture point is defined by its spherical coordinates (yaw, pitch) and includes tolerance thresholds for alignment detection:
// Actual alignment detection from app.js
const APPROACH_THRESHOLD = 0.25; // ~14 degrees - show orange indicator
const ALIGNMENT_THRESHOLD = 0.08; // ~4.6 degrees - turn green, allow capture
// Get camera direction vector
const cameraDirection = new THREE.Vector3(0, 0, -1);
cameraDirection.applyQuaternion(this.scene.camera.quaternion);
// Find nearest uncaptured hotspot
this.hotspots.forEach(hotspot => {
if (!this.capturedHotspots.has(hotspot.id)) {
// Convert hotspot yaw/pitch to 3D position
const hotspotPosition = this.scene.hotspotToPosition(hotspot.yaw, hotspot.pitch);
const direction = hotspotPosition.clone().normalize();
const angle = cameraDirection.angleTo(direction); // Angle in radians
if (angle < nearestDistance) {
nearestDistance = angle;
nearestHotspot = hotspot;
}
}
});
// hotspotToPosition converts spherical to Cartesian coordinates:
// phi = (90 - pitch) * Math.PI / 180; // Angle from vertical
// theta = yaw * Math.PI / 180; // Angle around vertical
// return new THREE.Vector3(
// radius * Math.sin(phi) * Math.sin(theta), // X
// radius * Math.cos(phi), // Y (up)
// radius * Math.sin(phi) * Math.cos(theta) // Z
// );
The Approach: Inside the Sphere
Unlike traditional panorama apps that show a flat grid or compass interface, VFTCam places the user inside a Three.js wireframe sphere. This approach transforms the abstract concept of spherical capture into an intuitive, spatial experience.
Three.js Sphere Architecture
The 3D visualization system creates a live, responsive environment that updates in real-time with device orientation:
// scene.js - Creating the capture sphere environment
export class Scene {
constructor(canvas) {
// Initialize Three.js scene inside a sphere
this.scene = new THREE.Scene();
this.camera = new THREE.PerspectiveCamera(75, aspect, 0.1, 1000);
// Create wireframe sphere (user is INSIDE looking out)
const sphereGeometry = new THREE.SphereGeometry(
100, // Radius - large enough to feel immersive
24, // Width segments
16 // Height segments
);
// Wireframe material - see through to understand structure
const sphereMaterial = new THREE.MeshBasicMaterial({
color: 0x666666,
wireframe: true,
opacity: 0.3,
transparent: true,
side: THREE.BackSide // CRITICAL: Render inside of sphere
});
this.sphere = new THREE.Mesh(sphereGeometry, sphereMaterial);
this.scene.add(this.sphere);
// Add 36 hotspot markers on the sphere surface
this.createHotspotMarkers();
}
createHotspotMarkers() {
this.hotspotMeshes = new Map();
hotspots.forEach(hotspot => {
// Convert spherical coordinates to 3D position
const position = this.hotspotToPosition(hotspot.yaw, hotspot.pitch);
// Create capture point indicator
const geometry = new THREE.SphereGeometry(3, 16, 16);
const material = new THREE.MeshBasicMaterial({
color: 0xff0000, // Red = not captured
emissive: 0xff0000,
emissiveIntensity: 0.5
});
const mesh = new THREE.Mesh(geometry, material);
mesh.position.copy(position);
// Add glow effect for better visibility
const glowGeometry = new THREE.SphereGeometry(4, 16, 16);
const glowMaterial = new THREE.MeshBasicMaterial({
color: 0xff0000,
transparent: true,
opacity: 0.3
});
const glowMesh = new THREE.Mesh(glowGeometry, glowMaterial);
mesh.add(glowMesh);
this.scene.add(mesh);
this.hotspotMeshes.set(hotspot.id, mesh);
});
}
}
Capturing Device Orientation Variations
The most innovative aspect of VFTCam's capture mechanism is how it handles the inevitable variations in device orientation. No human can hold a phone perfectly level or aligned—there's always some roll (tilt), pitch variation, and yaw drift. Rather than fighting these variations, VFTCam captures and stores them, using the data to improve stitching quality.
// Capturing actual device orientation with all variations
async captureImage(hotspot) {
// Store the ACTUAL device orientation, not the ideal
const captureData = {
hotspotId: hotspot.id,
// Ideal target position
targetYaw: hotspot.yaw,
targetPitch: hotspot.pitch,
// Actual device orientation at capture moment
actualYaw: this.deviceOrientation.alpha,
actualPitch: this.deviceOrientation.beta - 90, // Adjusted for portrait
actualRoll: this.deviceOrientation.gamma, // Device tilt/rotation
// Accelerometer data for precise tilt measurement
accelerometerX: this.lastAcceleration.x,
accelerometerY: this.lastAcceleration.y,
accelerometerZ: this.lastAcceleration.z,
// Computed device tilt from accelerometer (more accurate than gamma)
deviceTilt: Math.atan2(
this.lastAcceleration.x,
Math.sqrt(
this.lastAcceleration.y ** 2 +
this.lastAcceleration.z ** 2
)
) * 180 / Math.PI,
// Timestamp for motion interpolation
timestamp: Date.now(),
// Camera parameters
fov: 44, // Actual measured FOV
aspectRatio: canvas.height / canvas.width // Portrait orientation
};
// Store image with all orientation metadata
await this.database.saveCapture(imageBlob, captureData);
}
Roll Compensation and Normalization
The captured roll (device tilt) data becomes crucial during stitching. Instead of assuming all images are perfectly level, VFTCam uses the roll information to rotate each image back to its correct orientation before projection:
// Using roll data to normalize images during stitching
function normalizeImageOrientation(imageData, captureMetadata) {
const { actualRoll, deviceTilt } = captureMetadata;
// Compute rotation matrix to compensate for device tilt
// This "unrolls" the image to its true horizon
const rollCompensation = -actualRoll * Math.PI / 180;
// In the WebGL shader, apply roll correction
const rotationMatrix = `
mat3 rollMatrix = mat3(
cos(roll), -sin(roll), 0.0,
sin(roll), cos(roll), 0.0,
0.0, 0.0, 1.0
);
// Apply roll correction before projection
vec3 correctedRay = rollMatrix * cameraRay;
`;
return rollCompensation;
}
// WebGL shader incorporating roll compensation
const fragmentShader = `
uniform float camRoll[36]; // Actual roll for each captured image
void main() {
// For each potential source image
for (int i = 0; i < 36; i++) {
// Get the camera's actual orientation
float yaw = camYaw[i];
float pitch = camPitch[i];
float roll = camRoll[i]; // This is the key innovation
// Create rotation matrix including roll
mat3 rotMatrix = createRotationMatrix(yaw, pitch, roll);
// Transform world ray to camera space with roll correction
vec3 camRay = rotMatrix * worldRay;
// Now the image is properly oriented for projection
vec2 uv = projectToImage(camRay);
}
}
`;
Visual Feedback Loop
The actual visual feedback system in scene.js uses color changes (orange to green), opacity, and scale to indicate alignment status. Captured images are displayed as textured patches on the sphere with a subtle breathing animation:
// From scene.js - Actual hotspot highlighting when approaching
highlightHotspot(hotspotId, isAligned, isLevel) {
this.hotspotMarkers.forEach(marker => {
if (marker.userData.hotspotId === hotspotId) {
// Highlight the approached hotspot
marker.material.color.setHex(isAligned ? 0x4CAF50 : 0xFFA726); // Green when aligned, orange when near
marker.material.opacity = isAligned ? 0.9 : 0.7;
marker.scale.setScalar(isAligned ? 1.5 : 1.2); // Grow when aligned
} else {
// Dim other hotspots
marker.material.color.setHex(0xFFA726); // Orange
marker.material.opacity = 0.6; // Dimmer
marker.scale.setScalar(1); // Normal size
}
});
}
// Hide hotspot marker after capture and show image patch
markHotspotCaptured(hotspotId) {
const marker = this.hotspotMarkers.find(m => m.userData.hotspotId === hotspotId);
if (marker) {
// Hide marker to show it's been captured
marker.visible = false;
}
}
// Add captured image as a spherical patch on the sphere
addCapturedPatch(hotspot, imageData) {
const img = new Image();
img.onload = () => {
const texture = new THREE.Texture(img);
texture.needsUpdate = true;
// Create the patch mesh with this texture
const patch = this.createPatchMesh(hotspot, texture);
// Store hotspot reference on the patch for later use
patch.userData.hotspot = hotspot;
// Add to scene and array
this.scene.add(patch);
this.capturedPatches.push(patch);
};
img.src = imageData;
}
// Subtle "breathing" animation for captured patches
render() {
this.capturedPatches.forEach((patch, i) => {
// Oscillate scale by ±1% with offset per patch
const breathe = 1 + Math.sin(Date.now() * 0.0005 + i) * 0.01;
patch.scale.set(breathe, breathe, breathe);
});
this.renderer.render(this.scene, this.camera);
}
Why This Approach is Novel
Spatial Understanding
Users intuitively understand they're building a sphere from the inside, making the abstract concept of panoramic coverage concrete and visible
Orientation Tolerance
By capturing and using roll/tilt data rather than rejecting imperfect captures, the system works with human limitations instead of against them
Progressive Visualization
As images are captured, they appear on the sphere surface, giving immediate feedback about coverage and quality
Error Prevention
Users can see gaps in coverage before finishing, preventing the common problem of discovering missing areas after stitching
This "inside-out" approach, combined with comprehensive orientation tracking, transforms panoramic capture from a technical exercise into an intuitive spatial experience.
Understanding Field of View
One of the most critical parameters in photosphere stitching is the camera's field of view (FOV). Getting this wrong results in misaligned seams, distortion, or gaps in coverage. Mobile phone cameras typically have FOVs between 60-75°, but this varies by device and capture mode.
FOV Calculation from Camera Parameters:
FOV = 2 * arctan(sensorSize / (2 * focalLength))
For typical phone cameras:
Horizontal FOV: ~65-70° (portrait)
Vertical FOV: ~75-85° (portrait)
Diagonal FOV: ~80-90°
The optimal FOV was determined through empirical testing:
// FOV configuration for portrait capture
// After extensive testing, these values work best
const HFOVdeg = 44; // Horizontal FOV in portrait (narrow dimension)
const VFOVdeg = 73; // Vertical FOV in portrait (wide dimension)
// Calculate overlap for 12 images per row
const spacing = 360 / 12; // 30° between capture points
const overlap = HFOVdeg - spacing; // 44° - 30° = 14° overlap (~32%)
The discrepancy between video and photo FOV required careful calibration:
// WebRTC video stream is typically CROPPED compared to photo mode
// We empirically tested with grid patterns to find actual coverage
const FOV_CALIBRATION = {
// Initial attempts with wider FOVs created distortion
firstAttempt: { h: 65, v: 85, result: 'Too wide - severe warping' },
secondAttempt: { h: 55, v: 78, result: 'Still distorted at edges' },
// Final calibrated values that work across devices
final: {
horizontal: 44, // Narrower than expected due to video crop
vertical: 73, // Portrait orientation
result: 'Clean stitching without gaps'
}
};
// These conservative FOV values ensure no gaps between images
// Better to have more overlap than risk missing coverage
The 36-point pattern with empirically-determined 44° horizontal FOV provides approximately 32% overlap between adjacent images. While this may seem conservative, it ensures complete coverage across all devices we tested, from iPhones to Samsung Galaxy devices to Google Pixels. The narrower FOV accounts for the significant cropping that occurs in WebRTC video streams compared to native photo capture.
Coordinate System and Projections
Understanding the mathematical relationship between camera coordinates and equirectangular projection was absolutely crucial for successful stitching. This isn't just about placing images—it's about correctly mapping every pixel from a perspective projection (how cameras see) to a spherical projection (how panoramas are stored).
The core challenge is that we're dealing with three different coordinate systems that must align perfectly:
- Device Orientation Space: Alpha (0-360°), Beta (-180 to 180°), Gamma (-90 to 90°)
- Camera Space: Yaw, Pitch, Roll with portrait adjustments
- Equirectangular Space: 2D image where x maps to longitude, y maps to latitude
The Complete Transformation Pipeline (from actual shader code):
1. ERP Pixel to World Direction:
// Convert pixel to equirectangular coordinates
float lng = v_uv.x * 2.0 * PI - PI; // Longitude: -π to π
float lat = (1.0 - v_uv.y) * PI - PI/2; // Latitude: -π/2 to π/2
// Convert to 3D world direction
vec3 worldDir = vec3(
cos(lat) * sin(lng), // x
sin(lat), // y (up/down)
cos(lat) * cos(lng) // z
);
2. World to Camera Space (for each of 36 images):
// Apply inverse rotations: yaw, then pitch, then roll
float cy = cos(yaw), sy = sin(yaw);
vec3 temp = vec3(
cy * worldDir.x + sy * worldDir.z,
worldDir.y,
-sy * worldDir.x + cy * worldDir.z
);
// ... pitch and roll rotations follow
3. Perspective Projection to Image:
if (camDir.z <= 0.0) continue; // Behind camera
// Project using actual FOV
float xn = camDir.x / camDir.z;
float yn = camDir.y / camDir.z;
const float HFOVdeg = 44.0, VFOVdeg = 73.0;
float tanHalfHF = tan(PI * HFOVdeg / 360.0);
float tanHalfVF = tan(PI * VFOVdeg / 360.0);
// Convert to texture coordinates [0,1]
float u = (xn / (2.0*tanHalfHF)) + 0.5;
float v = (yn / (2.0*tanHalfVF)) + 0.5;
The critical insight was that equirectangular projection preserves angles but distorts areas—the poles get stretched horizontally. This is why our pole-filling algorithm was necessary:
// The stretching factor at any latitude
function getStretchFactor(latitude) {
// At equator (lat=0): no stretch
// At poles (lat=±90°): infinite stretch
return 1 / Math.cos(latitude * Math.PI / 180);
}
// This is why we don't capture at the poles
// Even a 1-pixel wide feature would stretch to fill the entire top row
The coordinate system also determined our capture pattern. The 45° pitch for upper/lower rows wasn't arbitrary—it's the sweet spot where:
- Coverage reaches close enough to poles (90° - 45° - FOV/2 ≈ 23° from pole)
- Distortion remains manageable (stretch factor of ~1.4 vs infinite at poles)
- Overlap between rows provides redundancy for the best-pixel algorithm
// Actual capture pattern from hotspots.js
static getHotspots() {
const hotspots = [];
const pitchAngles = [45, 0, -45]; // Upper, middle, lower
const yawStep = 30; // 360° / 12 points per row
let id = 1;
for (const pitch of pitchAngles) {
for (let i = 0; i < 12; i++) {
const yaw = i * yawStep;
hotspots.push({
id: id++,
pitch: pitch,
yaw: yaw,
captured: false
});
}
}
return hotspots;
}
// The 45° pitch was chosen because:
// 1. Covers from horizon to near-pole (45° + FOV/2 ≈ 67°)
// 2. Provides good overlap between rows
// 3. Avoids extreme distortion near poles (>70°)
// 4. Three rows give complete sphere coverage with redundancy
Without this deep understanding of coordinate transformations and projections, stitching would be blind guesswork. Instead, the algorithm can predict exactly where each pixel should appear and why, making debugging and optimization possible.
WebGL2 Best-Pixel Stitching
The stitching pipeline uses WebGL2 shaders to select the sharpest pixel from overlapping images. This GPU-accelerated approach processes 36 images into a 4096×2048 equirectangular panorama in under 5 seconds on mobile devices.
The Algorithm
- Equirectangular to World: Convert output pixel (u,v) to longitude/latitude, then to 3D world direction
- World to Camera: Apply inverse rotations (yaw, pitch, roll) to transform world ray to each camera's space
- Perspective Projection: Project 3D ray onto 2D image plane using actual FOV (44°×73°)
- Quality Scoring: Rate each pixel by distance from image center (best quality at center, worst at edges)
- Voronoi Stabilization: Add bias to prefer the "closest" camera to prevent flickering seams
- Best Pixel or Blend: Use the highest-scoring pixel, or blend multiple pixels as fallback
// Fragment shader for best-pixel selection (actual implementation)
#version 300 es
precision highp float;
uniform sampler2DArray tiles; // All 36 images as texture array
uniform float camYaw[36], camPitch[36], camRoll[36];
uniform float gain[36]; // Per-image exposure compensation
uniform float threshold;
uniform float voronoiBias; // Stabilizes seams
// Convert between linear and sRGB color spaces
vec3 toLin(vec3 c) { return pow(max(c, vec3(0.0)), vec3(2.2)); }
vec3 toSRGB(vec3 c) { return pow(max(c, vec3(0.0)), vec3(1.0/2.2)); }
// Quality based on distance from image center
float quality(float xn, float yn, float tanH, float tanV) {
float dx = abs(xn) / tanH;
float dy = abs(yn) / tanV;
float dist = sqrt(dx*dx + dy*dy);
return 1.0 - dist; // 1.0 at center, 0.0 at edge
}
void main() {
// Convert pixel to equirectangular coordinates
float lng = v_uv.x * 2.0 * PI - PI;
float lat = (1.0 - v_uv.y) * PI - PI/2.0;
// Convert to 3D world direction
vec3 worldDir = vec3(
cos(lat) * sin(lng),
sin(lat),
cos(lat) * cos(lng)
);
float bestScore = -1.0;
vec3 bestColor = vec3(0.0);
vec3 accumColor = vec3(0.0);
float accumWeight = 0.0;
// Check each captured image
for (int l = 0; l < 36; ++l) {
// Apply inverse rotation to get camera direction
// This is the actual rotation math from the code
float yaw = camYaw[l], pitch = camPitch[l], roll = camRoll[l];
// Rotate world direction to camera space (yaw, pitch, roll)
vec3 camDir = applyRotation(worldDir, yaw, pitch, roll);
if (camDir.z <= 0.0) continue; // Behind camera
// Project onto image plane using FOV
float xn = camDir.x / camDir.z;
float yn = camDir.y / camDir.z;
const float HFOVdeg = 44.0, VFOVdeg = 73.0;
float tanHalfHF = tan(PI * HFOVdeg / 360.0);
float tanHalfVF = tan(PI * VFOVdeg / 360.0);
if (abs(xn) > tanHalfHF || abs(yn) > tanHalfVF) continue;
// Convert to texture coordinates
float u = (xn / (2.0*tanHalfHF)) + 0.5;
float v = (yn / (2.0*tanHalfVF)) + 0.5;
// Sample the image and apply gain
vec3 color = toLin(texture(tiles, vec3(u, v, float(l))).rgb);
color *= gain[l];
// Calculate quality (distance from center)
float q = quality(xn, yn, tanHalfHF, tanHalfVF);
// Voronoi-biased score for stable seams
float score = q - voronoiBias * erpDistance(v_uv, layerCenterUV[l]);
if (score > bestScore) {
bestScore = score;
bestColor = color;
}
// Soft fallback for blending
if (q > 0.0) {
float w = pow(q, 4.0); // Strong center weighting
accumColor += color * w;
accumWeight += w;
}
}
// Choose crisp best pixel or blended fallback
vec3 finalColor = (bestScore > threshold) ? bestColor :
(accumWeight > 0.0) ? accumColor / accumWeight :
vec3(0.0);
frag = vec4(toSRGB(finalColor), 1.0);
}
Pole Filling with WebGL
The equirectangular projection creates distortion at the poles where no images are captured. These black circles are filled using a WebGL-based Gaussian blur:
// Actual pole filling implementation from pole-fill-simple.js
export function fillPolesSimple(gl, canvas) {
const w = canvas.width;
const h = canvas.height;
// Read pixels from WebGL canvas
const pixels = new Uint8Array(w * h * 4);
gl.readPixels(0, 0, w, h, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
// CONFIGURATION
const TOP_THRESHOLD = 105; // Where pole ends at top
const BOTTOM_THRESHOLD = h - 105; // Where pole ends at bottom
const STRETCH_HEIGHT = 120; // Height of region to fill
// Fill poles by stretching edge pixels
for (let x = 0; x < w; x++) {
// Sample colors at the edge of content
const topIdx = (TOP_THRESHOLD * w + x) * 4;
const topColor = [pixels[topIdx], pixels[topIdx+1], pixels[topIdx+2], 255];
const bottomIdx = (BOTTOM_THRESHOLD * w + x) * 4;
const bottomColor = [pixels[bottomIdx], pixels[bottomIdx+1], pixels[bottomIdx+2], 255];
// Fill top pole (y=0 to STRETCH_HEIGHT)
for (let y = 0; y < STRETCH_HEIGHT; y++) {
const idx = (y * w + x) * 4;
pixels.set(topColor, idx);
}
// Fill bottom pole (y=h-STRETCH_HEIGHT to h)
for (let y = h - STRETCH_HEIGHT; y < h; y++) {
const idx = (y * w + x) * 4;
pixels.set(bottomColor, idx);
}
}
// Apply WebGL Gaussian blur (30px radius, two-pass separable)
applyWebGLBlur(gl, pixels, w, h, 30);
}
Real-Time Device Orientation
Converting device orientation events to camera angles requires careful handling of coordinate systems and gimbal lock:
// Actual device orientation handling from app.js
handleDeviceOrientation(event) {
// Store raw values
this.deviceOrientation = {
alpha: event.alpha, // 0-360 compass direction
beta: event.beta, // -180 to 180 front-back tilt
gamma: event.gamma, // -90 to 90 left-right tilt
absolute: event.absolute
};
// iOS provides true compass heading directly
if (typeof event.webkitCompassHeading === 'number') {
this.deviceOrientation.absolute = true;
this.deviceOrientation.compassHeading = event.webkitCompassHeading;
if (this.compassOffset === null) {
this.compassOffset = event.webkitCompassHeading;
console.log(`iOS Compass calibrated: ${event.webkitCompassHeading}°`);
}
}
// Convert to camera angles (portrait mode)
const yaw = (event.alpha || 0) * Math.PI / 180;
const pitch = ((event.beta || 0) - 90) * Math.PI / 180;
const roll = (event.gamma || 0) * Math.PI / 180;
}
Memory Management on Mobile
iOS Safari's aggressive memory limits (~1.4GB) required careful optimization strategies:
Resolution Scaling
iOS devices capture at 720×1280 while Android uses 1080×1920, reducing memory by 44%
Immediate Cleanup
WebGL textures and blob URLs are released immediately after use
Smart Image Loading
Custom loader maintains a 50-image LRU cache with automatic eviction
Crash Recovery
Stitch jobs are tracked in IndexedDB for recovery after memory crashes
// Actual memory management from memory-utils.js
export class MemoryMonitor {
constructor() {
this.warningThreshold = 0.8; // Warn at 80% memory usage
this.criticalThreshold = 0.9; // Critical at 90%
// iOS memory limits based on device detection
this.iosMemoryLimits = {
small: 400, // MB - older devices (iPhone 6-8)
medium: 800, // MB - mid-range (iPhone X-12)
large: 1200 // MB - newer/pro (iPhone 13-15 Pro)
};
this.deviceMemoryLimit = this.detectDeviceMemoryLimit();
}
detectDeviceMemoryLimit() {
const isIOS = /iPad|iPhone|iPod/.test(navigator.userAgent);
const screenWidth = window.screen.width;
const screenHeight = window.screen.height;
if (isIOS) {
// Pro Max models: 430x932 logical pixels
if (screenWidth >= 428 && screenHeight >= 926) {
return this.iosMemoryLimits.large;
}
// Standard/Pro models
if (screenWidth >= 390) {
return this.iosMemoryLimits.medium;
}
// Older/smaller devices
return this.iosMemoryLimits.small;
}
// Non-iOS: use navigator.deviceMemory if available
if (navigator.deviceMemory) {
return navigator.deviceMemory * 1024 * 0.15; // 15% of device RAM
}
return 2048; // Default 2GB for desktop
}
}
Web Workers for Memory Pressure Reduction
One of the most critical architectural decisions was moving all heavy image processing to Web Workers. Mobile browsers, especially iOS Safari, will aggressively reload tabs when memory pressure increases. By offloading processing to workers, the main thread stays responsive and the browser is less likely to kill the tab.
The Memory Pressure Problem
During stitching, VFTCam must simultaneously handle:
- 36 captured images in memory (5-10MB as blobs)
- Decoded ImageBitmaps for each image (36 × ~8MB = 288MB)
- WebGL textures for the texture array (36 × ~8MB = 288MB)
- Output canvas at 4096×2048 (32MB)
- Total peak memory: ~600MB+
On iOS devices with a 1.4GB heap limit, this leaves little room for the browser's own overhead. Without workers, the main thread memory spike often triggers a tab reload, losing all progress.
Worker Architecture
// stitch-worker.js - Offloaded heavy processing
let isProcessing = false;
let shouldCancel = false;
// Send heartbeat every 2 seconds to show we're alive
const heartbeatInterval = setInterval(() => {
if (isProcessing) {
self.postMessage({
type: 'HEARTBEAT',
timestamp: Date.now(),
memoryUsage: performance.memory?.usedJSHeapSize || 0
});
}
}, 2000);
async function processImages(captures, targetWidth, targetHeight) {
const layers = [];
const failedImages = [];
// Process images one by one to control memory usage
for (let i = 0; i < captures.length; i++) {
if (shouldCancel) break;
const capture = captures[i];
try {
// Process individual image with 10-second timeout
const layer = await Promise.race([
processImage(capture, targetWidth, targetHeight),
timeout(10000, `Image ${capture.hotspotId} processing timeout`)
]);
if (layer) {
layers.push(layer);
}
// Report progress
self.postMessage({
type: 'PROGRESS',
current: i + 1,
total: captures.length,
message: `Processing image ${i + 1} of ${captures.length}`
});
// Yield every 3 images to prevent blocking
if (i % 3 === 0) {
await delay(10);
}
} catch (error) {
console.warn(`Failed to process image ${capture.hotspotId}:`, error);
failedImages.push({
hotspotId: capture.hotspotId,
error: error.message
});
}
}
// Check if we have minimum viable set (at least 12 images)
if (layers.length < 12) {
throw new Error(`Only ${layers.length} images processed successfully.`);
}
return layers;
}
Worker Management and Retry Logic
The StitchProcessor class manages worker lifecycle with automatic retry and fallback:
// stitch-processor.js - Resilient worker management
export class StitchProcessor {
constructor(app) {
this.worker = null;
this.maxWorkerRetries = 2;
this.fallbackToMainThread = false;
this.metrics = {
workerAttempts: 0,
mainThreadFallback: false,
emergencyMode: false
};
}
async processWithWorker(captures) {
return new Promise((resolve, reject) => {
// Create fresh worker for each attempt
this.worker = new Worker('/js/workers/stitch-worker.js');
// Set timeout for worker response
const timeout = setTimeout(() => {
reject(new Error('Worker timeout'));
this.terminateWorker();
}, 120000); // 2 minute timeout
// Handle worker messages
this.worker.onmessage = (e) => {
const { type, data } = e.data;
switch (type) {
case 'COMPLETE':
clearTimeout(timeout);
resolve(data.layers);
break;
case 'ERROR':
clearTimeout(timeout);
reject(new Error(data.message));
break;
case 'HEARTBEAT':
// Reset timeout on heartbeat
clearTimeout(timeout);
timeout = setTimeout(() => {
reject(new Error('Worker stopped responding'));
}, 10000);
break;
case 'PROGRESS':
this.updateProgress(data.processed, data.total);
break;
}
};
// Send work to worker
this.worker.postMessage({
type: 'PROCESS',
data: { captures, targetWidth: 1920, targetHeight: 1080 }
});
});
}
}
Memory Benefits
Isolated Memory Space
Workers have their own heap, reducing main thread pressure by ~300MB during processing
Sequential Processing
Images processed one by one with periodic yielding to prevent blocking
Transferable Objects
ImageBitmaps transferred (not copied) to main thread, saving memory and time
Graceful Degradation
Falls back to main thread processing if workers fail repeatedly
Stitch Jobs for Crash Recovery
Mobile browsers can spontaneously refresh or crash during intensive operations. To prevent users from losing their captured images and having to restart, VFTCam implements a sophisticated stitch job tracking system that enables seamless recovery.
The Problem: Spontaneous Refreshes
iOS Safari in particular will refresh tabs when:
- Memory pressure exceeds ~80% of the 1.4GB limit
- The user switches apps and returns
- JavaScript execution takes too long (>10 seconds without yielding)
- WebGL contexts exceed memory quotas
- The device enters low power mode during processing
Without job tracking, users would frustratingly lose 2-3 minutes of capture work and have to start over.
Stitch Job Architecture
// Database schema for stitch jobs (database.js)
if (!db.objectStoreNames.contains('stitch_jobs')) {
const stitchJobStore = db.createObjectStore('stitch_jobs', {
keyPath: 'id' // UUID for each job
});
stitchJobStore.createIndex('status', 'status', { unique: false });
stitchJobStore.createIndex('startedAt', 'startedAt', { unique: false });
}
// Stitch job structure
const stitchJob = {
id: generateUUID(),
status: 'pending', // pending → processing → complete → failed
captureIds: [1, 2, 3, ...36], // Hotspot IDs of captured images
startedAt: Date.now(),
updatedAt: Date.now(),
progress: {
stage: 'loading', // loading → decoding → stitching → saving
processed: 0,
total: 36
},
config: {
targetWidth: 4096,
targetHeight: 2048,
method: 'best-pixel',
poleBlur: true
},
result: null, // Panorama ID when complete
error: null // Error message if failed
};
Job Lifecycle Management
// app.js - Creating and tracking stitch jobs
async startStitching() {
// Check for existing incomplete job first
const existingJob = await this.checkForIncompleteJob();
if (existingJob) {
const resume = await this.cardUI.confirm(
'Previous stitching was interrupted. Resume?',
'Recovery Available'
);
if (resume) {
return this.resumeStitchJob(existingJob);
}
}
// Create new stitch job
const jobId = await this.database.saveStitchJob({
id: this.generateJobId(),
status: 'pending',
captureIds: Array.from(this.capturedHotspots),
startedAt: Date.now(),
config: {
targetWidth: 4096,
targetHeight: 2048,
method: 'best-pixel'
}
});
// Store job ID in session storage for quick recovery
sessionStorage.setItem('activeStitchJob', jobId);
try {
// Update job status as we progress
await this.database.updateStitchJob(jobId, {
status: 'processing',
progress: { stage: 'loading', processed: 0, total: 36 }
});
const result = await this.processStitching(jobId);
// Mark job complete
await this.database.updateStitchJob(jobId, {
status: 'complete',
result: result.panoramaId,
completedAt: Date.now()
});
} catch (error) {
// Save error state for debugging
await this.database.updateStitchJob(jobId, {
status: 'failed',
error: error.message,
failedAt: Date.now()
});
throw error;
} finally {
sessionStorage.removeItem('activeStitchJob');
}
}
Recovery After Refresh
// app.js - Recovery on page load
async checkForIncompleteJob() {
// Check session storage first (fastest)
const activeJobId = sessionStorage.getItem('activeStitchJob');
if (activeJobId) {
const job = await this.database.getStitchJob(activeJobId);
if (job && job.status !== 'complete') {
return job;
}
}
// Check for any recent incomplete jobs
const recentJobs = await this.database.getRecentStitchJobs(24 * 60 * 60 * 1000); // Last 24 hours
for (const job of recentJobs) {
if (job.status === 'processing' || job.status === 'pending') {
// Verify captures still exist
const capturesExist = await this.verifyCapturesExist(job.captureIds);
if (capturesExist) {
return job;
}
}
}
return null;
}
async resumeStitchJob(job) {
console.log(`Resuming stitch job ${job.id} from ${job.progress.stage}`);
// Restore UI state
this.showProcessingCard();
this.updateProgress(job.progress.processed, job.progress.total);
// Resume from last checkpoint
switch (job.progress.stage) {
case 'loading':
// Start over - images need to be reloaded
return this.processStitching(job.id);
case 'decoding':
// Resume from decoding phase
const captures = await this.loadCapturesFromJob(job);
return this.continueFromDecoding(job.id, captures);
case 'stitching':
// WebGL context lost - need to restart stitching
return this.restartStitching(job.id);
case 'saving':
// Panorama was created but not saved
return this.retrySaving(job.id);
}
}
Automatic Cleanup
// database.js - Cleanup old jobs
async cleanupOldJobs(maxAge = 7 * 24 * 60 * 60 * 1000) {
const transaction = this.db.transaction(['stitch_jobs'], 'readwrite');
const store = transaction.objectStore('stitch_jobs');
const index = store.index('startedAt');
const cutoff = Date.now() - maxAge;
const range = IDBKeyRange.upperBound(cutoff);
const cursor = index.openCursor(range);
cursor.onsuccess = (event) => {
const cursor = event.target.result;
if (cursor) {
// Only delete completed or failed jobs
if (cursor.value.status === 'complete' ||
cursor.value.status === 'failed') {
store.delete(cursor.value.id);
}
cursor.continue();
}
};
}
VR Viewing with Google Cardboard/VR Headsets
Every photosphere captured with VFTCam can be viewed in VR using Google Cardboard or similar mobile VR viewers. Rather than rely on third-party libraries, VFTCam includes a custom WebGL2-based VR viewer built from scratch to ensure optimal performance and control over the viewing experience.
The custom VR implementation renders two viewports with proper stereoscopic separation and barrel distortion correction:
// Custom VR viewer with stereoscopic rendering
class VRViewer {
constructor() {
this.fovDeg = 90; // Field of view for each eye
this.ipdYaw = 0.100; // Inter-pupillary distance in radians
this.yaw = 0; // Current viewing direction
this.pitch = 0;
// Device orientation tracking
this.calibrated = false;
this.yawOffset = 0;
}
// Render stereoscopic view for VR headsets
render() {
const gl = this.gl;
// Left eye viewport
gl.viewport(0, 0, canvas.width / 2, canvas.height);
this.renderEye(-this.ipdYaw / 2);
// Right eye viewport
gl.viewport(canvas.width / 2, 0, canvas.width / 2, canvas.height);
this.renderEye(this.ipdYaw / 2);
// Apply barrel distortion mask for Cardboard lenses
this.applyDistortionMask();
}
renderEye(eyeOffset) {
// Set uniforms for eye-specific rendering
gl.uniform1f(this.yawLoc, this.yaw + eyeOffset);
gl.uniform1f(this.pitchLoc, this.pitch);
gl.uniform1f(this.fovLoc, this.fovDeg * Math.PI / 180);
// Draw the photosphere
gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
}
}
The VR implementation includes several key features:
- Stereoscopic Rendering: Dual viewport rendering with adjustable IPD (inter-pupillary distance)
- Barrel Distortion Mask: Pre-rendered PNG overlay compensates for Cardboard lens distortion
- Head Tracking: DeviceOrientationEvent provides natural look-around control
- Calibration System: One-tap recentering to correct for drift
- Landscape Enforcement: Automatic prompt to rotate device for proper VR viewing
- Adjustable IPD: User can fine-tune eye separation for comfort (saved in localStorage)
The barrel distortion correction uses a pre-computed mask image that's overlaid on the rendered scene:
// Apply barrel distortion mask for Google Cardboard
applyDistortionMask() {
const ctx = this.maskCanvas.getContext('2d');
// Clear and draw the mask
ctx.clearRect(0, 0, this.maskCanvas.width, this.maskCanvas.height);
if (this.maskImg && this.maskImg.complete) {
// Draw mask with proper scaling for device
const scale = window.devicePixelRatio || 1;
ctx.drawImage(this.maskImg,
0, 0,
this.maskCanvas.width / scale,
this.maskCanvas.height / scale
);
}
}
// Handle device orientation for head tracking
handleOrientation(event) {
// Convert quaternion to Euler angles
const { alpha, beta, gamma } = event;
// Apply calibration offset
this.yaw = (alpha - this.yawOffset) * Math.PI / 180;
this.pitch = (beta - 90) * Math.PI / 180; // Adjust for landscape
// Clamp pitch to prevent over-rotation
this.pitch = Math.max(-Math.PI/2, Math.min(Math.PI/2, this.pitch));
}
Proper XMP metadata is also embedded so photospheres are recognized by other VR viewers:
// Add photosphere XMP metadata for VR compatibility
function addPhotosphereMetadata(imageBlob) {
const xmpData = `
<rdf:Description rdf:about="" xmlns:GPano="http://ns.google.com/photos/1.0/panorama/">
<GPano:ProjectionType>equirectangular</GPano:ProjectionType>
<GPano:UsePanoramaViewer>True</GPano:UsePanoramaViewer>
<GPano:CroppedAreaImageWidthPixels>4096</GPano:CroppedAreaImageWidthPixels>
<GPano:CroppedAreaImageHeightPixels>2048</GPano:CroppedAreaImageHeightPixels>
<GPano:FullPanoWidthPixels>4096</GPano:FullPanoWidthPixels>
<GPano:FullPanoHeightPixels>2048</GPano:FullPanoHeightPixels>
<GPano:CroppedAreaLeftPixels>0</GPano:CroppedAreaLeftPixels>
<GPano:CroppedAreaTopPixels>0</GPano:CroppedAreaTopPixels>
</rdf:Description>`;
return embedXMP(imageBlob, xmpData);
}
Location Data and GPS Embedding
Location data is a critical component for educational photospheres. When students create virtual field trips, the GPS coordinates automatically link their captures to real-world locations, enabling powerful features in tour-building software like pano2VR.
Pre-Capture Location Request
VFTCam requests location permission before starting the capture process, ensuring GPS data is available for embedding in the final photosphere:
// From capture.js - Request location before capture starts
if ('geolocation' in navigator) {
console.log('Requesting geolocation permission...');
try {
const position = await new Promise((resolve, reject) => {
navigator.geolocation.getCurrentPosition(resolve, reject, {
enableHighAccuracy: false, // False for faster response
timeout: 10000,
maximumAge: 0
});
});
// Store location in app instance for later use
window.sphereCapture.captureLocation = {
latitude: position.coords.latitude,
longitude: position.coords.longitude,
altitude: position.coords.altitude,
accuracy: position.coords.accuracy
};
// Store permission state for the permissions UI
localStorage.setItem('locationPermissionState', 'granted');
} catch (error) {
console.warn('Location permission denied or unavailable');
// Continue without location - it's optional
}
}
EXIF GPS Data Embedding with Piexifjs
Using piexifjs, VFTCam embeds precise GPS coordinates into the JPEG EXIF data, following the standard GPS IFD format:
// From metadata-utils.js - Convert and embed GPS coordinates
if (options.latitude !== undefined && options.longitude !== undefined) {
// Convert decimal degrees to degrees, minutes, seconds for EXIF
exifObj["GPS"][piexif.GPSIFD.GPSLatitude] = this.degToDmsRational(options.latitude);
exifObj["GPS"][piexif.GPSIFD.GPSLatitudeRef] = options.latitude < 0 ? 'S' : 'N';
exifObj["GPS"][piexif.GPSIFD.GPSLongitude] = this.degToDmsRational(options.longitude);
exifObj["GPS"][piexif.GPSIFD.GPSLongitudeRef] = options.longitude < 0 ? 'W' : 'E';
if (options.altitude !== undefined) {
exifObj["GPS"][piexif.GPSIFD.GPSAltitude] = [Math.abs(Math.round(options.altitude * 100)), 100];
exifObj["GPS"][piexif.GPSIFD.GPSAltitudeRef] = options.altitude < 0 ? 1 : 0;
}
// Add GPS timestamp
const gpsDate = options.captureDate || new Date();
exifObj["GPS"][piexif.GPSIFD.GPSDateStamp] = this.formatGPSDate(gpsDate);
exifObj["GPS"][piexif.GPSIFD.GPSTimeStamp] = this.formatGPSTime(gpsDate);
}
// Convert decimal degrees to EXIF DMS (degrees, minutes, seconds) format
degToDmsRational(deg) {
const absolute = Math.abs(deg);
const degrees = Math.floor(absolute);
const minutesFloat = (absolute - degrees) * 60;
const minutes = Math.floor(minutesFloat);
const seconds = Math.round((minutesFloat - minutes) * 60 * 100);
return [
[degrees, 1], // Degrees as rational
[minutes, 1], // Minutes as rational
[seconds, 100] // Seconds as rational (multiplied by 100 for precision)
];
}
Integration with Tour Building Software
The embedded GPS data provides powerful automation in professional tour-building tools:
- Automatic Linking: "Closest Node" linking connects photospheres based on GPS proximity
- Ghost Hotspots: Gray hotspots appear automatically between nearby photospheres
- Tour Map Generation: GPS data automatically places nodes on the tour map
- Google Street View Ready: Proper GPS formatting meets Street View requirements
- Sequential Tours: GPS ordering helps create logical navigation paths
When students import their VFTCam photospheres into pano2VR, the software automatically reads the GPS EXIF data and:
- Places each photosphere on the tour map at its correct location
- Calculates distances between nodes for automatic linking
- Generates navigation hotspots pointing toward nearby photospheres
- Creates a logical tour flow based on geographic proximity
Educational Benefits
For virtual field trips, embedded GPS data enables:
- Scientific Documentation: Precise location recording for geological or ecological studies
- Historical Context: Link photospheres to exact historical sites
- Multi-Site Comparison: Students can compare locations across different field trips
- Collaborative Mapping: Multiple students' captures automatically organize by location
- Research Reproducibility: Future researchers can return to exact capture locations
Glass-morphic UI Design
The UI uses a glass-morphic design system with backdrop filters for a modern, professional appearance that works well over the camera feed:
.glass-card {
background: rgba(255, 255, 255, 0.1);
backdrop-filter: blur(20px);
-webkit-backdrop-filter: blur(20px);
border: 1px solid rgba(255, 255, 255, 0.2);
border-radius: 20px;
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
}
/* Fallback for browsers without backdrop-filter */
@supports not (backdrop-filter: blur(20px)) {
.glass-card {
background: rgba(255, 255, 255, 0.95);
}
}
Saving and Sharing Photospheres
Once a photosphere is created, users need flexible ways to save and share their work. VFTCam implements native sharing through the Web Share API on mobile devices and provides multiple export options for all platforms.
Native Share Sheet Integration
On mobile devices, VFTCam leverages the Web Share API to present the native share sheet, allowing users to send photospheres directly to any app on their device:
// From share-utils.js - Native sharing implementation
async tryWebShare(blob, filename) {
if (!this.supportsFileSharing) return false;
// Create File object from blob
const file = new File([blob], filename, { type: 'image/jpeg' });
const shareData = {
files: [file],
title: '360° Panorama',
text: '360° panorama captured with Photosphere Camera'
};
// Verify that the browser can share this type of data
if (navigator.canShare && navigator.canShare(shareData)) {
try {
await navigator.share(shareData);
return true;
} catch (error) {
if (error.name === 'AbortError') {
// User cancelled - this is not an error
return true;
}
console.warn('Web Share failed:', error);
}
}
// Fallback: Try sharing as data URL if file sharing isn't supported
if (this.supportsWebShare) {
try {
// Convert blob to data URL
const reader = new FileReader();
const dataUrl = await new Promise((resolve) => {
reader.onloadend = () => resolve(reader.result);
reader.readAsDataURL(blob);
});
// Share as URL instead of file
await navigator.share({
title: '360° Panorama',
text: '360° panorama captured with Photosphere Camera',
url: dataUrl // Some apps can handle data URLs
});
return true;
} catch (error) {
console.warn('Web Share without files failed:', error);
}
}
return false;
}
The native share sheet provides access to:
- AirDrop: Send directly to nearby Apple devices
- Messages/WhatsApp: Share in conversations
- Social Media: Post to Instagram, Facebook, Twitter
- Cloud Storage: Save to Google Photos, iCloud, Dropbox
- Email: Attach to email messages
- Other Apps: Open in specialized 360° viewers or editors
Direct Download Fallback
When Web Share API isn't available, VFTCam falls back to direct download:
// From share-utils.js - Download implementation
downloadBlob(blob, filename) {
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = filename;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url); // Clean up memory
}
Bulk Export with ZIP Archives
Users can download all their photospheres at once as a ZIP archive, perfect for backing up field work or transferring to desktop software:
// From camera-roll.js - ZIP export implementation
async downloadAll() {
// Load JSZip dynamically only when needed
if (!window.JSZip) {
await new Promise((resolve, reject) => {
const script = document.createElement('script');
script.src = 'https://cdnjs.cloudflare.com/ajax/libs/jszip/3.10.1/jszip.min.js';
script.onload = resolve;
script.onerror = reject;
document.head.appendChild(script);
});
}
const panoramas = await this.database.loadAllPanoramas();
const zip = new JSZip();
const photospheresFolder = zip.folder('photospheres');
// Add each panorama to the ZIP with progress updates
for (let i = 0; i < panoramas.length; i++) {
const pano = panoramas[i];
const timestamp = new Date(pano.timestamp).toISOString()
.replace(/[:.]/g, '-').slice(0, -5);
const filename = `photosphere_${timestamp}.jpg`;
if (pano.imageBlob) {
photospheresFolder.file(filename, pano.imageBlob);
} else if (pano.imageData) {
// Convert base64 to blob if needed (legacy format)
const base64 = pano.imageData.replace(/^data:image\/\w+;base64,/, '');
photospheresFolder.file(filename, base64, { base64: true });
}
// Update progress bar
const progress = Math.round(((i + 1) / panoramas.length) * 100);
const progressBar = document.getElementById('download-progress-bar');
if (progressBar) {
progressBar.style.width = `${progress}%`;
}
}
// Add metadata file
const metadata = {
app: 'VFT Photosphere Camera',
version: '1.5.1',
exportDate: new Date().toISOString(),
panoramaCount: panoramas.length,
totalSizeBytes: panoramas.reduce((sum, p) =>
sum + (p.imageBlob?.size || 0), 0)
};
zip.file('metadata.json', JSON.stringify(metadata, null, 2));
// Generate and download ZIP
const content = await zip.generateAsync({
type: 'blob',
compression: 'DEFLATE',
compressionOptions: { level: 6 }
});
const url = URL.createObjectURL(content);
const link = document.createElement('a');
link.href = url;
link.download = `photospheres_${new Date().toISOString().slice(0, 10)}.zip`;
link.click();
URL.revokeObjectURL(url);
}
Clipboard Integration
Modern browsers support copying images directly to the system clipboard, allowing users to paste photospheres into other applications:
// From share-utils.js - Clipboard API implementation
async copyImageToClipboard(blob) {
// Check for Clipboard API support
if (!navigator.clipboard || !navigator.clipboard.write) {
return false;
}
try {
// Create clipboard item with MIME type
const clipboardItem = new ClipboardItem({
[blob.type]: blob
});
await navigator.clipboard.write([clipboardItem]);
return true;
} catch (error) {
console.warn('Clipboard copy failed:', error);
return false;
}
}
- Native share sheet on iOS/Android for seamless app integration
- AirDrop support for instant Apple device transfer
- Direct download for desktop browsers
- Bulk ZIP export with metadata preservation
- Clipboard copy for quick pasting into documents
- Automatic filename generation with timestamps
Progressive Web App Implementation
VFTCam is a full Progressive Web App with offline support, making it reliable in remote field locations. The PWA implementation required solving unique challenges around module loading, iOS compatibility, and storage persistence.
Service Worker Architecture
The service worker (v1.5.1) uses Workbox 7.0.0 with custom strategies for different resource types:
// service-worker.js - Version-based cache management
const CACHE_VERSION = '1.5.1';
const BUILD_TIMESTAMP = '1757869480994'; // Updated by Python script
// Import Workbox from CDN
importScripts('https://storage.googleapis.com/workbox-cdn/releases/7.0.0/workbox-sw.js');
// Configure cache names with version
workbox.core.setCacheNameDetails({
prefix: 'vftcam',
suffix: CACHE_VERSION,
precache: 'precache',
runtime: 'runtime'
});
// Precache 80+ critical assets
const precacheManifest = [
{ url: '/', revision: BUILD_TIMESTAMP },
{ url: '/index.html', revision: BUILD_TIMESTAMP },
{ url: '/js/modules/app.js', revision: BUILD_TIMESTAMP },
{ url: '/js/modules/camera.js', revision: BUILD_TIMESTAMP },
{ url: '/js/modules/database.js', revision: BUILD_TIMESTAMP },
// ... 75 more files
];
precacheAndRoute(precacheManifest);
// Custom script handler for iOS Safari module compatibility
registerRoute(
({ request }) => request.destination === 'script' ||
request.url.includes('/js/'),
async ({ request, event }) => {
const cache = await caches.open('vftcam-scripts');
// Handle relative paths (./js/ vs /js/)
let normalizedUrl = request.url;
if (normalizedUrl.includes('./js/')) {
const url = new URL(request.url);
normalizedUrl = url.href.replace('./js/', '/js/');
}
let cachedResponse = await cache.match(request);
if (cachedResponse) {
// Clone and modify headers for iOS Safari
const headers = new Headers(cachedResponse.headers);
headers.set('Content-Type', 'application/javascript');
headers.set('Cache-Control', 'no-cache'); // Force revalidation
const blob = await cachedResponse.blob();
return new Response(blob, {
status: cachedResponse.status,
headers: headers
});
}
// Fallback to network
const response = await fetch(request);
await cache.put(request, response.clone());
return response;
}
);
Installation Detection and Storage Persistence
The PWAStatus module tracks installation state and storage risks across different platforms:
// pwa-status.js - Multi-method installation detection
checkInstallation() {
// Three different methods to detect standalone mode
return window.matchMedia('(display-mode: standalone)').matches ||
window.navigator.standalone === true || // iOS Safari
document.referrer.includes('android-app://'); // Android TWA
}
async checkPersistence() {
if (!navigator.storage || !navigator.storage.persisted) {
return false;
}
try {
const isPersistent = await navigator.storage.persisted();
if (!isPersistent && this.isInstalled) {
// Request persistence if installed but not persistent
return await navigator.storage.persist();
}
return isPersistent;
} catch (error) {
// iOS Safari may throw here
console.error('Persistence check failed:', error);
return false;
}
}
getStorageWarning() {
const status = this.getStatus();
if (status.storageRisk === 'high') {
return {
level: 'danger',
title: 'Storage at Risk',
message: 'Your browser may delete your photospheres after 7 days of inactivity. Install as a Progressive Web App (Add to Home Screen) to protect your data.',
action: 'Install to Protect'
};
}
// ... other warning levels
}
Offline Fallback Strategy
A custom offline page lists available features when disconnected:
// Offline fallback with navigation preload
const OFFLINE_URL = '/offline.html';
// Cache offline page during install
self.addEventListener('install', event => {
event.waitUntil(
caches.open('vftcam-offline').then(cache => {
return cache.add(OFFLINE_URL);
})
);
});
// Serve offline page for navigation failures
registerRoute(
new NavigationRoute(async ({ event }) => {
try {
// Try network first for navigation
return await fetch(event.request);
} catch (error) {
// Fallback to offline page
const cache = await caches.open('vftcam-offline');
const cachedResponse = await cache.match(OFFLINE_URL);
return cachedResponse || new Response('Offline', {
status: 503,
statusText: 'Service Unavailable'
});
}
})
);
Storage Architecture
VFTCam implements a sophisticated multi-tier storage system designed to handle the unique challenges of storing large photosphere data on mobile devices while battling browser storage eviction policies. The architecture prioritizes data persistence and performance through intelligent fallback mechanisms.
Database Schema
The IndexedDB database uses two primary object stores with optimized schemas:
// Database structure from database.js
const DB_NAME = 'PhotosphereDB';
const DB_VERSION = 5;
// Object stores configuration
const stores = {
'captures': { // Temporary storage during capture session
keyPath: 'hotspotId', // Primary key (1-36)
schema: {
hotspotId: 'number', // Position identifier
imageBlob: 'Blob', // AVIF/WebP blob (~150-300KB)
yaw: 'number', // Target horizontal angle
pitch: 'number', // Target vertical angle
actualYaw: 'number', // Actual device yaw when captured
actualPitch: 'number', // Actual device pitch
roll: 'number', // Device tilt/roll
fov: 'number', // Field of view (typically 67°)
timestamp: 'string' // ISO timestamp
}
},
'panoramas': { // Persistent storage for completed photospheres
keyPath: 'id',
autoIncrement: true,
schema: {
id: 'number', // Auto-generated unique ID
imageBlob: 'Blob', // JPEG panorama (~1.5-3MB)
timestamp: 'string', // Creation time
imageCount: 'number', // Source images used (typically 36)
type: 'string', // Stitching method
width: 'number', // Panorama dimensions
height: 'number',
opfsId: 'string?' // Optional OPFS backup reference
}
}
}
Three-Tier Storage Strategy
The storage system implements automatic fallbacks to ensure data persistence across different browser environments and storage quotas:
// Actual storage implementation with fallback chain
export class Database {
constructor() {
this.db = null;
this.DB_NAME = 'PhotosphereDB';
this.DB_VERSION = 5;
this.STORE_NAME = 'captures';
this.useOPFS = false;
this.opfs = opfsStorage; // OPFS integration for better persistence
}
async savePanorama(blob, metadata) {
// Validate size constraints (max 50MB)
if (blob.size > 50 * 1024 * 1024) {
const sizeMB = (blob.size / (1024 * 1024)).toFixed(1);
throw new Error(`Panorama too large (${sizeMB}MB). Maximum size is 50MB.`);
}
// TIER 1: IndexedDB (primary storage)
const tx = this.db.transaction(['panoramas'], 'readwrite');
const store = tx.objectStore('panoramas');
const panorama = {
imageBlob: blob, // Direct blob storage (no base64)
timestamp: new Date().toISOString(),
imageCount: metadata.imageCount || 36,
type: metadata.type || 'best-pixel',
width: metadata.width || 4096,
height: metadata.height || 2048
};
// TIER 2: OPFS (Origin Private File System) for better persistence
if (this.useOPFS && navigator.storage?.getDirectory) {
try {
const id = await this.opfs.savePanorama(blob);
panorama.opfsId = id; // Link OPFS backup to IndexedDB record
console.log('Panorama backed up to OPFS for persistence');
} catch (e) {
console.warn('OPFS save failed, using IndexedDB only:', e);
}
}
// Store in IndexedDB
const request = store.add(panorama);
await new Promise((resolve, reject) => {
request.onsuccess = resolve;
request.onerror = () => reject(request.error);
});
// TIER 3: Cache API fallback (for service worker access)
try {
const cache = await caches.open('vftcam-panoramas');
const response = new Response(blob, {
headers: {
'Content-Type': 'image/jpeg',
'X-Panorama-ID': String(request.result)
}
});
await cache.put(`/panorama/${request.result}`, response);
} catch (e) {
console.warn('Cache API backup failed:', e);
}
return request.result;
}
}
OPFS Integration for Safari/iOS
The Origin Private File System provides better persistence guarantees, especially on iOS Safari which aggressively deletes IndexedDB data after 7 days of inactivity:
// OPFS storage module with Safari worker compatibility
class OPFSStorage {
constructor() {
this.worker = null; // Required for Safari - only works in workers
this.initialized = false;
this.isSupported = 'storage' in navigator &&
'getDirectory' in navigator.storage;
}
async init() {
if (!this.isSupported) return false;
// Safari requires OPFS access through a Web Worker
this.worker = new Worker('/js/workers/opfs-worker.js');
// Worker handles actual file operations
this.worker.postMessage({
action: 'init',
directories: ['panoramas', 'captures', 'temp']
});
return true;
}
async savePanorama(blob) {
// Generate unique filename with timestamp
const filename = `panorama_${Date.now()}.jpg`;
// Send blob to worker for OPFS storage
return new Promise((resolve, reject) => {
const channel = new MessageChannel();
channel.port1.onmessage = (e) => {
if (e.data.success) resolve(e.data.id);
else reject(e.data.error);
};
this.worker.postMessage({
action: 'save',
filename,
blob
}, [channel.port2]);
});
}
}
Storage Quotas and Limits
Different browsers and platforms impose varying storage limits that the system must navigate:
iOS Safari
~1GB limit for web apps
7-day eviction policy
OPFS provides better persistence
Android Chrome
6% of free disk space
Persistent storage API
No time-based eviction
Blob Storage Optimization
Moving from base64 to direct blob storage provided significant benefits, though testing revealed iOS Safari silently converts AVIF/WebP to PNG, making JPEG the pragmatic choice:
// From camera.js - actual implementation with Safari workaround
async encodeCanvas(canvas, quality = 0.85) {
// Use JPEG everywhere for consistency and reliability
// iOS Safari returns PNG for AVIF/WebP (1360KB vs 195KB for JPEG!)
// The overhead of trying formats isn't worth it for a mobile-first app
const tryTypes = [
['image/jpeg', 0.85], // JPEG - universal, reliable, small enough
];
for (const [type, q] of tryTypes) {
console.log(`Attempting to encode as ${type} with quality ${q}`);
const blob = await new Promise(resolve =>
canvas.toBlob(resolve, type, q)
);
if (blob && blob.size > 0) {
console.log(`Result: ${blob.type}, size: ${(blob.size / 1024).toFixed(1)}KB`);
// If we got what we asked for, or if it's our last option, use it
if (blob.type === type || type === 'image/jpeg') {
return blob;
}
// Otherwise, Safari gave us something else, try next format
console.log(`Safari returned ${blob.type} instead of ${type}, trying next format...`);
}
}
throw new Error('Failed to encode image');
}
// From app.js - storage implementation
const blob = await this.camera.capturePhoto(maxWidth, maxHeight);
await this.database.saveCapture({
hotspotId: hotspot.id,
imageBlob: blob, // Direct binary storage, no base64
yaw: Math.round(yaw),
pitch: Math.round(pitch),
actualPitch: Math.round(actualPitch),
actualYaw: Math.round(actualYaw),
roll: Math.round(roll),
timestamp: Date.now()
});
// Size comparison for typical capture:
// Base64 JPEG: ~300KB per image × 36 = 10.8MB (plus UTF-16 overhead = ~21.6MB in memory)
// JPEG Blob: ~195KB per image × 36 = 7MB (no string overhead)
// Reduction: 67% less memory used
PWA Update Strategy
Updates are handled through a Python script that versions the service worker:
# update-sw.py - Automatic cache busting
import time
import re
def update_service_worker():
timestamp = str(int(time.time() * 1000))
with open('public/service-worker.js', 'r') as f:
content = f.read()
# Update BUILD_TIMESTAMP
content = re.sub(
r"const BUILD_TIMESTAMP = '\d+';",
f"const BUILD_TIMESTAMP = '{timestamp}';",
content
)
# Update CACHE_VERSION
version = time.strftime('%Y%m%d.%H%M%S')
content = re.sub(
r"const CACHE_VERSION = '[^']+';",
f"const CACHE_VERSION = '{version}';",
content
)
with open('public/service-worker.js', 'w') as f:
f.write(content)
- Storage cleared after 7 days unless installed to home screen
- Service Worker limitations in WKWebView
- No background sync or push notifications
- Module caching ignores standard cache headers
Lessons Learned
What Worked Well
- Zero-build architecture: Dramatically simplified debugging and maintenance
- WebGL2 for stitching: 10x faster than Canvas 2D or WebAssembly approaches
- Structured capture pattern: More reliable than feature-matching approaches
- Visual feedback: Real-time 3D sphere showing capture progress
Challenges and Solutions
Performance Metrics
Capture Performance
- Alignment detection: <100ms
- Auto-capture delay: 1000ms
- Frame rate: 60fps maintained
Stitching Performance
- 36 images: <5s on iPhone 12
- Output: 4096×2048 pixels
- Memory peak: <500MB
Storage Efficiency
- Capture: ~300KB per image
- Panorama: ~2-3MB JPEG
- Total for session: ~13MB
Privacy by Design
In an era of pervasive data collection, VFTCam takes a radically different approach: no user data is collected whatsoever. This isn't just a policy decision—it's architecturally impossible.
- 100% Client-Side: All processing happens in your browser using WebGL2
- No Analytics: No Google Analytics, no telemetry, no tracking pixels
- No Server Communication: After initial load, the app never contacts any server
- Fully Auditable: As a static website, all code is transparent and inspectable
Being a static website with no backend means several important privacy guarantees:
- Your photos never leave your device - All 36 capture images and stitched panoramas remain in your browser's IndexedDB
- Sensor data stays local - Device orientation and GPS (if enabled) are used only for capture guidance and metadata
- No user accounts - No registration, no profiles, no data linking across sessions
- Complete data control - Delete individual photospheres or clear everything instantly
- Offline-first design - Once cached, works entirely without internet connection
The source code transparency of a static web app means anyone can verify these privacy claims. Every line of JavaScript is readable in browser DevTools, and network monitoring confirms no external requests are made during operation.
Full Privacy Policy: For complete details, see vftcam.stanford.edu/privacy-policy.html
Accessibility
While VFTCam is primarily a visual and spatial application, significant effort went into making it as accessible as possible. The app includes support for reduced motion preferences, high contrast modes, and comprehensive ARIA labeling throughout.
Actual Accessibility Implementation
Rather than building complex accessibility managers, we focused on semantic HTML and CSS media queries that respect user preferences:
ARIA Labels and Semantic HTML
<!-- From index.html - Every interactive element has proper labels -->
<button id="start-btn"
onclick="startApp()"
aria-label="Start capturing photosphere">
<img src="./img/camera.svg" alt="Camera icon">
Start Capturing
</button>
<div id="camera-viewport" aria-label="Camera viewfinder">
<div id="alignment-indicator" aria-hidden="true"></div>
<div id="roll-indicator"
aria-label="Device level indicator"
aria-hidden="true">
</div>
</div>
<!-- Screen reader announcements for capture progress -->
<div id="sr-announcements"
class="sr-only"
role="status"
aria-live="polite"
aria-atomic="true"></div>
<!-- Modal dialogs with proper ARIA attributes -->
<div id="stitching-overlay"
class="ui-card"
role="dialog"
aria-labelledby="stitch-title"
aria-modal="true"
aria-hidden="true"></div>
Respecting User Preferences with CSS
CSS media queries automatically adapt the interface based on user system preferences:
/* From cards.css - Reduced motion support */
@media (prefers-reduced-motion: reduce) {
.ui-card {
transition: opacity 150ms ease-in-out;
}
* {
animation-duration: 0.01ms !important;
animation-iteration-count: 1 !important;
transition-duration: 0.01ms !important;
scroll-behavior: auto !important;
}
}
/* From cards.css - High contrast mode support */
@media (prefers-contrast: high) {
.ui-card {
background: rgba(0, 0, 0, 0.95) !important;
border: 2px solid white !important;
}
.control-btn {
border: 2px solid white !important;
background: black !important;
}
#camera-viewport {
border: 3px solid white !important;
}
#alignment-indicator {
border: 4px solid white !important;
background: transparent !important;
}
}
Touch Target Sizing
All buttons meet or exceed WCAG's 44×44px minimum touch target size:
/* From cards.css - Close buttons and action buttons */
.card-close-btn {
width: 44px;
height: 44px;
border-radius: 50%;
}
.action-btn {
width: 44px;
height: 44px;
border-radius: 50%;
}
/* From capture.css - Control buttons */
.control-btn {
width: 50px;
height: 50px;
border-radius: 50%;
}
/* Mobile adjustments for smaller screens */
@media (max-width: 375px) {
.control-btn {
width: 45px;
height: 45px;
}
}
Screen Reader Announcements
The hidden live region in index.html provides real-time updates to screen reader users:
/* From capture.css - Screen reader only content */
.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
clip: rect(0, 0, 0, 0);
white-space: nowrap;
border: 0;
}
What We Actually Achieved
ARIA Labels
Every button and interactive element has descriptive aria-labels
Motion Preferences
Animations automatically disabled for users with vestibular disorders
High Contrast
UI adapts to high contrast mode with solid borders and backgrounds
Touch Targets
All buttons are at least 44×44px with adequate spacing
While we didn't build complex accessibility managers or focus traps, the combination of semantic HTML, ARIA attributes, and CSS media queries creates an interface that respects user preferences and works with assistive technologies. By following WCAG 2.1 guidelines for touch targets, color contrast, and semantic markup, we ensured the app is usable by people with disabilities.
Educational Impact
VFTCam transforms smartphones into powerful tools for creating virtual field trips, enabling students and educators to capture and share immersive experiences from anywhere in the world. Whether documenting geological formations, historical sites, or ecological habitats, VFTCam democratizes the creation of educational 360° content.
Key benefits for virtual field trips:
- Accessibility: Students who can't physically visit locations can experience them immersively
- Documentation: Preserve field sites exactly as they appear for future classes
- Remote Learning: Share field experiences with distance learners or partner schools
- Scientific Record: Create timestamped, geolocated documentation of changing environments
- Student Agency: Empower students to be creators, not just consumers, of educational content
- No Connectivity Required: Works offline in remote locations without cell service
By removing the technical and financial barriers to creating photospheres, VFTCam enables any classroom to build their own library of virtual field trips, turning every excursion into a reusable educational resource that can be experienced by students for years to come.
External Libraries
VFTCam stands on the shoulders of excellent open source projects. These libraries made the zero-dependency approach possible:
Each library was carefully selected for its reliability, performance, and compatibility with our zero-build philosophy. All are included as standalone files without modification, making updates and debugging straightforward.
Deployment
VFTCam requires no build process and can be deployed to any static web server as a collection of HTML, CSS, and JavaScript files.
Conclusion
Building VFTCam demonstrated that modern web platform APIs are powerful enough to create sophisticated imaging applications without native code. By embracing web standards and avoiding complex toolchains, we created a maintainable, educational tool that will serve students for years to come.
The key insight was recognizing that the web platform's "limitations" often lead to better architectural decisions. Memory constraints forced efficient algorithms. The lack of native APIs pushed us toward creative solutions using existing web standards.
As web capabilities continue to expand, projects like VFTCam show that the browser is not just a document viewer but a powerful platform for computational photography, computer vision, and immersive media creation.
Created by:
Reuben Thiessen
Emerging Technology Lead
Accelerator Studio within the Stanford Accelerator for Learning