Architecture Details
Understand the technical architecture that makes Probably’s local-first approach possible while maintaining powerful AI capabilities.
System Architecture Overview
Section titled “System Architecture Overview”High-Level Components
Section titled “High-Level Components”┌─────────────────────────────────────────────────────────────────────────────┐│ User Interface Layer │├─────────────────────────────────────────────────────────────────────────────┤│ Spreadsheet Interface │ Visualizations │ AI Agent Chat │└─────────────────────────────┴────────────────────┴─────────────────────────┘ │┌─────────────────────────────────────────────────────────────────────────────┐│ Application Logic Layer │├─────────────────────────────────────────────────────────────────────────────┤│ PXL Runtime Engine │ Analysis Coordinator │ Result Processor │└─────────────────────────┴───────────────────────────┴─────────────────────────┘ │┌─────────────────────────────────────────────────────────────────────────────┐│ Data Processing Layer │├─────────────────────────────────────────────────────────────────────────────┤│ Local Data Engine │ Cache Manager │ Query Optimizer │ Type System │└────────────────────┴─────────────────┴──────────────────┴─────────────────┘ │┌─────────────────────────────────────────────────────────────────────────────┐│ External Interface Layer │├─────────────────────────────────────────────────────────────────────────────┤│ Database Connectors │ File System Access │ Encrypted AI Gateway │└──────────────────────┴──────────────────────┴───────────────────────────────┘ │┌─────────────────────────────────────────────────────────────────────────────┐│ External Resources │├─────────────────────────────────────────────────────────────────────────────┤│ Data Sources │ Local Storage │ AI Providers ││ ┌─────────────────┐ │ ┌─────────────────────┐ │ ┌─────────────┐ ││ │ Databases │ │ │ Cache & Metadata │ │ │ OpenAI │ ││ │ Files │ │ │ User Configurations │ │ │ Anthropic │ ││ │ APIs │ │ │ Analysis Results │ │ │ Google │ ││ └─────────────────┘ │ └─────────────────────┘ │ └─────────────┘ │└─────────────────────────┴───────────────────────────────┴─────────────────────┘Core Architectural Principles
Section titled “Core Architectural Principles”Local Execution
- All data processing happens on user’s machine
- No server-side computation dependencies
- Direct database connections without proxies
- Local file system access and management
Selective External Interaction
- Encrypted AI queries when explicitly requested
- Minimal data transmission to external services
- User-controlled external service integration
- Transparent audit trail of all external calls
Modular Design
- Pluggable data source connectors
- Extensible AI provider interfaces
- Configurable processing pipelines
- Isolated security boundaries
Data Processing Architecture
Section titled “Data Processing Architecture”Local Data Engine
Section titled “Local Data Engine”┌─────────────────────────────────────────────────────────────────────────────┐│ Local Data Engine │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ ││ │ Query Parser │ │ Data Catalog │ │ Schema Manager │ ││ │ │ │ │ │ │ ││ │ • PXL → SQL │ │ • Data Sources │ │ • Type Inference │ ││ │ • Optimization │ │ • Relationships │ │ • Schema Validation │ ││ │ • Validation │ │ • Lineage │ │ • Format Detection │ ││ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ ││ │ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ ││ │ Execution Engine│ │ Memory Manager │ │ Result Processor │ ││ │ │ │ │ │ │ ││ │ • Query Execution│ │ • Buffer Pools │ │ • Format Conversion │ ││ │ • Parallel Ops │ │ • GC Management │ │ • Aggregation │ ││ │ • Stream Processing│ │ • Spill to Disk │ │ • Materialization │ ││ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Query Processing Flow
- Parse: Convert PXL expressions to internal representation
- Plan: Generate optimal execution plan
- Execute: Run queries against local/remote data sources
- Cache: Store results for performance
- Present: Format results for user interface
Memory Management
- Streaming Processing: Handle datasets larger than memory
- Lazy Evaluation: Compute only what’s needed
- Garbage Collection: Automatic memory cleanup
- Spill Handling: Graceful handling of memory limits
Data Source Integration
Section titled “Data Source Integration”Database Connectors
┌─────────────────────────────────────────────────────────────┐│ Database Connector Layer │├─────────────────────────────────────────────────────────────┤│ ││ ┌─────────────┐ ┌─────────────────────────────────────┐ ││ │ Snowflake │ │ Additional Database │ ││ │ Connector │ │ Connectors (Coming Soon) │ ││ └─────────────┘ └─────────────────────────────────────┘ ││ │ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ Unified Data Source Interface │ ││ │ │ ││ │ • Connection Management │ ││ │ • Query Translation │ ││ │ • Result Set Handling │ ││ │ • Error Management │ ││ │ • Security Integration │ ││ └─────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────┘Connection Management
- Pool Management: Efficient connection reuse
- Health Monitoring: Automatic connection health checks
- Retry Logic: Robust error handling and recovery
- Security: SSL/TLS encryption and authentication
File System Integration
- Format Support: CSV, Parquet, Excel, JSON, etc.
- Streaming Readers: Memory-efficient file processing
- Schema Detection: Automatic format and type inference
- Encoding Handling: Unicode and character set support
AI Integration Architecture
Section titled “AI Integration Architecture”Secure AI Gateway
Section titled “Secure AI Gateway”┌─────────────────────────────────────────────────────────────────────────────┐│ AI Gateway Layer │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ ││ │ Request Router │ │ Load Balancer │ │ Response Cache │ ││ │ │ │ │ │ │ ││ │ • Provider Select│ │ • Rate Limiting │ │ • Semantic Caching │ ││ │ • Model Selection│ │ • Failover │ │ • TTL Management │ ││ │ • Context Prep │ │ • Performance │ │ • Cache Invalidation │ ││ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ ││ │ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ ││ │ Security Layer │ │ Audit Logger │ │ Error Handler │ ││ │ │ │ │ │ │ ││ │ • Encryption │ │ • Request Logs │ │ • Retry Logic │ ││ │ • Authentication│ │ • Response Logs │ │ • Fallback Strategies │ ││ │ • Data Filtering│ │ • Performance │ │ • Error Classification │ ││ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Request Processing
- Context Preparation: Extract relevant data context
- Privacy Filtering: Remove sensitive information
- Provider Selection: Choose optimal AI provider
- Encryption: Secure data transmission
- Response Processing: Parse and validate AI responses
Multi-Provider Support
- Load Balancing: Distribute requests across providers
- Failover: Automatic switching on provider issues
- Rate Limit Management: Respect provider API limits
- Cost Optimization: Choose cost-effective providers
AI Security Model
Section titled “AI Security Model”Data Protection Layers
User Data → Privacy Filter → Context Extraction → Encryption → AI Provider ↓ ↓ ↓ ↓ ↓[Raw Data] [Sanitized] [Minimal Context] [Encrypted] [AI Response] ↓ ↓ ↓ ↓ ↓[Local Only] [No PII] [Question Only] [TLS 1.3] [Processed Locally]Privacy Guarantees
- No Raw Data Transmission: Only processed queries sent
- PII Filtering: Automatic removal of personal information
- Context Minimization: Send only necessary context
- Response Sanitization: Clean AI responses before local use
Storage and Caching Architecture
Section titled “Storage and Caching Architecture”Local Storage Strategy
Section titled “Local Storage Strategy”┌─────────────────────────────────────────────────────────────────────────────┐│ Local Storage Layer │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ ││ │ Metadata Store │ │ Query Cache │ │ Configuration │ ││ │ │ │ │ │ │ ││ │ • Schema Info │ │ • Result Sets │ │ • User Preferences │ ││ │ • Data Lineage │ │ • Query Plans │ │ • Connection Settings │ ││ │ • Column Stats │ │ • AI Responses │ │ • Security Keys │ ││ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ ││ │ ││ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ ││ │ Session Data │ │ Temp Storage │ │ Backup Manager │ ││ │ │ │ │ │ │ ││ │ • Current State │ │ • Intermediate │ │ • Auto Backup │ ││ │ • Undo History │ │ • Spill Files │ │ • Version Control │ ││ │ • User Context │ │ • Export Queue │ │ • Recovery Procedures │ ││ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Cache Management
- Intelligent Caching: Smart cache key generation
- LRU Eviction: Least recently used cache replacement
- Size Limits: Configurable cache size constraints
- Persistence: Cache survival across application restarts
Data Lifecycle
- Import: Efficient data ingestion from sources
- Process: In-memory or disk-based processing
- Cache: Results caching for performance
- Export: Multiple export format support
- Cleanup: Automatic temporary file management
Security Architecture
Section titled “Security Architecture”Multi-Layer Security Model
Section titled “Multi-Layer Security Model”┌─────────────────────────────────────────────────────────────────────────────┐│ Application Security │├─────────────────────────────────────────────────────────────────────────────┤│ Access Control │ Data Encryption │ Network Security │ Audit Logging ││ │ │ │ ││ • User Auth │ • At-Rest Encryption │ • TLS 1.3 │ • All Actions ││ • Role-Based │ • In-Transit Crypto │ • Certificate │ • Data Access ││ • API Key Mgmt │ • Key Management │ Validation │ • API Calls │└─────────────────┴───────────────────────┴───────────────────┴───────────────┘ │┌─────────────────────────────────────────────────────────────────────────────┐│ System Security │├─────────────────────────────────────────────────────────────────────────────┤│ Process Isolation │ Memory Protection │ File System Access │ Network ││ │ │ │ Boundaries ││ • Sandboxing │ • Buffer Overflow │ • Permission Model │ • Firewall ││ • Resource Limits │ Protection │ • Path Validation │ Rules ││ • Privilege Drop │ • Stack Protection │ • Secure Temp Dirs │ • Port Mgmt │└────────────────────┴─────────────────────┴────────────────────┴─────────────┘Encryption Standards
- Data at Rest: AES-256 encryption for stored data
- Data in Transit: TLS 1.3 for all network communications
- Key Management: Secure key derivation and storage
- Certificate Validation: Strict certificate chain validation
Performance Architecture
Section titled “Performance Architecture”Optimization Strategies
Section titled “Optimization Strategies”Query Optimization
- Query Planning: Cost-based query optimization
- Predicate Pushdown: Filter early in processing pipeline
- Column Pruning: Read only necessary columns
- Join Optimization: Efficient join algorithms
Parallel Processing
- Thread Pool Management: Configurable worker threads
- Task Scheduling: Priority-based task scheduling
- Resource Allocation: CPU and memory resource management
- Backpressure Handling: Graceful handling of resource limits
Memory Optimization
- Streaming Operations: Process data in chunks
- Lazy Loading: Load data on demand
- Memory Mapping: Efficient file access patterns
- Garbage Collection: Tuned GC parameters
Deployment Architecture
Section titled “Deployment Architecture”Installation Models
Section titled “Installation Models”Standalone Application
- Single executable with all dependencies
- Local database for metadata and cache
- File-based configuration management
- Auto-update mechanism for security patches
Enterprise Deployment
- Centralized license management
- Shared configuration templates
- Enterprise identity integration
- Compliance reporting tools
System Integration
Section titled “System Integration”Database Integration
- Native database driver support
- Connection pooling and management
- Query result streaming
- Transaction support where applicable
File System Integration
- Cross-platform file access
- Efficient large file handling
- Automatic format detection
- Secure temporary file management
Network Integration
- Proxy and firewall compatibility
- Corporate network policy compliance
- SSL/TLS certificate management
- DNS and service discovery
Extensibility Architecture
Section titled “Extensibility Architecture”Plugin System
Section titled “Plugin System”Data Source Plugins
- Standardized connector interface
- Authentication method abstraction
- Query translation framework
- Error handling standardization
AI Provider Plugins
- Provider-agnostic API interface
- Model capability discovery
- Cost and performance optimization
- Failover and load balancing
Export Format Plugins
- Extensible export format support
- Custom template systems
- Batch export processing
- Format-specific optimizations
Configuration Management
Section titled “Configuration Management”Hierarchical Configuration
- System defaults
- User preferences
- Project-specific settings
- Environment-based overrides
Dynamic Reconfiguration
- Runtime configuration updates
- Hot-reloading of settings
- Configuration validation
- Rollback capabilities
What’s Next?
Section titled “What’s Next?”Security Details
Explore the comprehensive security measures protecting your data and privacy.
Performance Benefits
Learn how the architecture delivers superior performance and efficiency.