Skip to content

Architecture Details

Understand the technical architecture that makes Probably’s local-first approach possible while maintaining powerful AI capabilities.

┌─────────────────────────────────────────────────────────────────────────────┐
│ User Interface Layer │
├─────────────────────────────────────────────────────────────────────────────┤
│ Spreadsheet Interface │ Visualizations │ AI Agent Chat │
└─────────────────────────────┴────────────────────┴─────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ Application Logic Layer │
├─────────────────────────────────────────────────────────────────────────────┤
│ PXL Runtime Engine │ Analysis Coordinator │ Result Processor │
└─────────────────────────┴───────────────────────────┴─────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ Data Processing Layer │
├─────────────────────────────────────────────────────────────────────────────┤
│ Local Data Engine │ Cache Manager │ Query Optimizer │ Type System │
└────────────────────┴─────────────────┴──────────────────┴─────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ External Interface Layer │
├─────────────────────────────────────────────────────────────────────────────┤
│ Database Connectors │ File System Access │ Encrypted AI Gateway │
└──────────────────────┴──────────────────────┴───────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ External Resources │
├─────────────────────────────────────────────────────────────────────────────┤
│ Data Sources │ Local Storage │ AI Providers │
│ ┌─────────────────┐ │ ┌─────────────────────┐ │ ┌─────────────┐ │
│ │ Databases │ │ │ Cache & Metadata │ │ │ OpenAI │ │
│ │ Files │ │ │ User Configurations │ │ │ Anthropic │ │
│ │ APIs │ │ │ Analysis Results │ │ │ Google │ │
│ └─────────────────┘ │ └─────────────────────┘ │ └─────────────┘ │
└─────────────────────────┴───────────────────────────────┴─────────────────────┘

Local Execution

  • All data processing happens on user’s machine
  • No server-side computation dependencies
  • Direct database connections without proxies
  • Local file system access and management

Selective External Interaction

  • Encrypted AI queries when explicitly requested
  • Minimal data transmission to external services
  • User-controlled external service integration
  • Transparent audit trail of all external calls

Modular Design

  • Pluggable data source connectors
  • Extensible AI provider interfaces
  • Configurable processing pipelines
  • Isolated security boundaries
┌─────────────────────────────────────────────────────────────────────────────┐
│ Local Data Engine │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Query Parser │ │ Data Catalog │ │ Schema Manager │ │
│ │ │ │ │ │ │ │
│ │ • PXL → SQL │ │ • Data Sources │ │ • Type Inference │ │
│ │ • Optimization │ │ • Relationships │ │ • Schema Validation │ │
│ │ • Validation │ │ • Lineage │ │ • Format Detection │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
│ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Execution Engine│ │ Memory Manager │ │ Result Processor │ │
│ │ │ │ │ │ │ │
│ │ • Query Execution│ │ • Buffer Pools │ │ • Format Conversion │ │
│ │ • Parallel Ops │ │ • GC Management │ │ • Aggregation │ │
│ │ • Stream Processing│ │ • Spill to Disk │ │ • Materialization │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘

Query Processing Flow

  1. Parse: Convert PXL expressions to internal representation
  2. Plan: Generate optimal execution plan
  3. Execute: Run queries against local/remote data sources
  4. Cache: Store results for performance
  5. Present: Format results for user interface

Memory Management

  • Streaming Processing: Handle datasets larger than memory
  • Lazy Evaluation: Compute only what’s needed
  • Garbage Collection: Automatic memory cleanup
  • Spill Handling: Graceful handling of memory limits

Database Connectors

┌─────────────────────────────────────────────────────────────┐
│ Database Connector Layer │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────────────────────────────┐ │
│ │ Snowflake │ │ Additional Database │ │
│ │ Connector │ │ Connectors (Coming Soon) │ │
│ └─────────────┘ └─────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Unified Data Source Interface │ │
│ │ │ │
│ │ • Connection Management │ │
│ │ • Query Translation │ │
│ │ • Result Set Handling │ │
│ │ • Error Management │ │
│ │ • Security Integration │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Connection Management

  • Pool Management: Efficient connection reuse
  • Health Monitoring: Automatic connection health checks
  • Retry Logic: Robust error handling and recovery
  • Security: SSL/TLS encryption and authentication

File System Integration

  • Format Support: CSV, Parquet, Excel, JSON, etc.
  • Streaming Readers: Memory-efficient file processing
  • Schema Detection: Automatic format and type inference
  • Encoding Handling: Unicode and character set support
┌─────────────────────────────────────────────────────────────────────────────┐
│ AI Gateway Layer │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Request Router │ │ Load Balancer │ │ Response Cache │ │
│ │ │ │ │ │ │ │
│ │ • Provider Select│ │ • Rate Limiting │ │ • Semantic Caching │ │
│ │ • Model Selection│ │ • Failover │ │ • TTL Management │ │
│ │ • Context Prep │ │ • Performance │ │ • Cache Invalidation │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
│ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Security Layer │ │ Audit Logger │ │ Error Handler │ │
│ │ │ │ │ │ │ │
│ │ • Encryption │ │ • Request Logs │ │ • Retry Logic │ │
│ │ • Authentication│ │ • Response Logs │ │ • Fallback Strategies │ │
│ │ • Data Filtering│ │ • Performance │ │ • Error Classification │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘

Request Processing

  1. Context Preparation: Extract relevant data context
  2. Privacy Filtering: Remove sensitive information
  3. Provider Selection: Choose optimal AI provider
  4. Encryption: Secure data transmission
  5. Response Processing: Parse and validate AI responses

Multi-Provider Support

  • Load Balancing: Distribute requests across providers
  • Failover: Automatic switching on provider issues
  • Rate Limit Management: Respect provider API limits
  • Cost Optimization: Choose cost-effective providers

Data Protection Layers

User Data → Privacy Filter → Context Extraction → Encryption → AI Provider
↓ ↓ ↓ ↓ ↓
[Raw Data] [Sanitized] [Minimal Context] [Encrypted] [AI Response]
↓ ↓ ↓ ↓ ↓
[Local Only] [No PII] [Question Only] [TLS 1.3] [Processed Locally]

Privacy Guarantees

  • No Raw Data Transmission: Only processed queries sent
  • PII Filtering: Automatic removal of personal information
  • Context Minimization: Send only necessary context
  • Response Sanitization: Clean AI responses before local use
┌─────────────────────────────────────────────────────────────────────────────┐
│ Local Storage Layer │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Metadata Store │ │ Query Cache │ │ Configuration │ │
│ │ │ │ │ │ │ │
│ │ • Schema Info │ │ • Result Sets │ │ • User Preferences │ │
│ │ • Data Lineage │ │ • Query Plans │ │ • Connection Settings │ │
│ │ • Column Stats │ │ • AI Responses │ │ • Security Keys │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
│ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Session Data │ │ Temp Storage │ │ Backup Manager │ │
│ │ │ │ │ │ │ │
│ │ • Current State │ │ • Intermediate │ │ • Auto Backup │ │
│ │ • Undo History │ │ • Spill Files │ │ • Version Control │ │
│ │ • User Context │ │ • Export Queue │ │ • Recovery Procedures │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘

Cache Management

  • Intelligent Caching: Smart cache key generation
  • LRU Eviction: Least recently used cache replacement
  • Size Limits: Configurable cache size constraints
  • Persistence: Cache survival across application restarts

Data Lifecycle

  • Import: Efficient data ingestion from sources
  • Process: In-memory or disk-based processing
  • Cache: Results caching for performance
  • Export: Multiple export format support
  • Cleanup: Automatic temporary file management
┌─────────────────────────────────────────────────────────────────────────────┐
│ Application Security │
├─────────────────────────────────────────────────────────────────────────────┤
│ Access Control │ Data Encryption │ Network Security │ Audit Logging │
│ │ │ │ │
│ • User Auth │ • At-Rest Encryption │ • TLS 1.3 │ • All Actions │
│ • Role-Based │ • In-Transit Crypto │ • Certificate │ • Data Access │
│ • API Key Mgmt │ • Key Management │ Validation │ • API Calls │
└─────────────────┴───────────────────────┴───────────────────┴───────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ System Security │
├─────────────────────────────────────────────────────────────────────────────┤
│ Process Isolation │ Memory Protection │ File System Access │ Network │
│ │ │ │ Boundaries │
│ • Sandboxing │ • Buffer Overflow │ • Permission Model │ • Firewall │
│ • Resource Limits │ Protection │ • Path Validation │ Rules │
│ • Privilege Drop │ • Stack Protection │ • Secure Temp Dirs │ • Port Mgmt │
└────────────────────┴─────────────────────┴────────────────────┴─────────────┘

Encryption Standards

  • Data at Rest: AES-256 encryption for stored data
  • Data in Transit: TLS 1.3 for all network communications
  • Key Management: Secure key derivation and storage
  • Certificate Validation: Strict certificate chain validation

Query Optimization

  • Query Planning: Cost-based query optimization
  • Predicate Pushdown: Filter early in processing pipeline
  • Column Pruning: Read only necessary columns
  • Join Optimization: Efficient join algorithms

Parallel Processing

  • Thread Pool Management: Configurable worker threads
  • Task Scheduling: Priority-based task scheduling
  • Resource Allocation: CPU and memory resource management
  • Backpressure Handling: Graceful handling of resource limits

Memory Optimization

  • Streaming Operations: Process data in chunks
  • Lazy Loading: Load data on demand
  • Memory Mapping: Efficient file access patterns
  • Garbage Collection: Tuned GC parameters

Standalone Application

  • Single executable with all dependencies
  • Local database for metadata and cache
  • File-based configuration management
  • Auto-update mechanism for security patches

Enterprise Deployment

  • Centralized license management
  • Shared configuration templates
  • Enterprise identity integration
  • Compliance reporting tools

Database Integration

  • Native database driver support
  • Connection pooling and management
  • Query result streaming
  • Transaction support where applicable

File System Integration

  • Cross-platform file access
  • Efficient large file handling
  • Automatic format detection
  • Secure temporary file management

Network Integration

  • Proxy and firewall compatibility
  • Corporate network policy compliance
  • SSL/TLS certificate management
  • DNS and service discovery

Data Source Plugins

  • Standardized connector interface
  • Authentication method abstraction
  • Query translation framework
  • Error handling standardization

AI Provider Plugins

  • Provider-agnostic API interface
  • Model capability discovery
  • Cost and performance optimization
  • Failover and load balancing

Export Format Plugins

  • Extensible export format support
  • Custom template systems
  • Batch export processing
  • Format-specific optimizations

Hierarchical Configuration

  • System defaults
  • User preferences
  • Project-specific settings
  • Environment-based overrides

Dynamic Reconfiguration

  • Runtime configuration updates
  • Hot-reloading of settings
  • Configuration validation
  • Rollback capabilities

Security Details

Explore the comprehensive security measures protecting your data and privacy.

Performance Benefits

Learn how the architecture delivers superior performance and efficiency.