Skip to content

Performance Benefits

Probably’s local-first architecture delivers exceptional performance by processing data locally while leveraging AI capabilities efficiently.

Local-first architecture provides significant performance benefits:

✅ Local-First Benefits

  • Zero network latency for data operations
  • Native CPU and memory utilization
  • Direct database connections
  • Instant response for most operations
  • Predictable performance characteristics
  • No dependency on external services

❌ Cloud-Based Limitations

  • Network latency affects every operation
  • Shared resources reduce performance
  • Data transfer bottlenecks
  • Variable response times
  • Dependency on internet connectivity
  • Third-party service reliability

Data Processing Speed

Local Processing: 1-5ms (Direct memory/disk access)
Cloud Processing: 50-500ms (Network + Processing + Transfer)
Query Execution: 10-100ms (Local database)
Cloud Query: 200-2000ms (Upload + Process + Download)
AI Operations: 100-1000ms (Only specific requests)
Cloud AI: 200-3000ms (Full data upload + processing)

DuckDB Local Performance

  • Fast columnar database for local data processing
  • Simple connection pooling (16 connections maximum)
  • Standard SQL query execution
  • Memory-mapped file access

Connection Management

  • Basic connection pool implementation
  • Simple connection reuse
  • Automatic connection cleanup
  • Standard DuckDB performance characteristics

File Operations

  • Standard file reading through Python libraries
  • CSV parsing using Polars library
  • Parquet support through DuckDB
  • Basic file type detection

Supported Formats

  • CSV files with automatic type inference
  • Parquet files for faster loading
  • Basic Excel file support
  • JSON processing capabilities

Standard Memory Handling

  • DuckDB manages memory for query processing
  • Python handles object lifecycle
  • Basic memory usage monitoring through system metrics
  • Standard garbage collection

Large Dataset Handling

  • DuckDB handles datasets larger than memory
  • Automatic spilling to disk when needed
  • Memory usage depends on query complexity
  • Performance scales with available RAM

Simple Caching Strategy

  • Basic function result caching in cache.py
  • DuckDB internal query caching
  • Simple file-based cache storage
  • Manual cache management

Cache Features

  • Function results cached to avoid recomputation
  • Cache invalidation on data changes
  • Simple hash-based cache keys
  • Cache stored in local filesystem

Context-Only Requests

  • Send only relevant data context to AI providers
  • Local data analysis provides context for AI queries
  • AI responses processed and integrated locally
  • Multiple provider support for redundancy

AI Request Pattern

Traditional Approach: Upload Full Dataset → Process → Download Results
(High Latency, Expensive, Privacy Concerns)
Probably's Approach: Extract Context → Send Query → Process Response
(Low Latency, Cost Effective, Privacy Preserved)

Basic Parallelism

  • DuckDB provides internal parallelism for queries
  • Python asyncio for concurrent operations
  • Simple thread pool for I/O operations
  • Standard CPU core utilization

Resource Usage

  • Automatic CPU core detection and usage
  • Memory usage scales with data size
  • Basic I/O optimization through libraries
  • Standard operating system resource management

Network Optimization

  • Request Compression: Compress AI requests and responses
  • Connection Reuse: Persistent connections to AI providers
  • Parallel Requests: Multiple concurrent AI requests
  • Regional Optimization: Use geographically closer AI endpoints

Bandwidth Efficiency

Data Operation Type | Network Usage | Performance Impact
------------------------|---------------|-------------------
Local Data Processing | 0 MB | Zero latency
Database Queries | 0 MB | Native speed
File Operations | 0 MB | Disk speed
AI Context Requests | ~1-10 KB | Minimal impact
AI Response Processing | ~1-50 KB | Low latency

Degraded Service Model

  • Core Analytics: Full functionality without internet
  • Cached AI Responses: Reuse previous AI insights
  • Local-Only Mode: Complete data analysis offline
  • Sync on Reconnect: Update when connection restored

System Metrics

  • Available memory detection for configuration
  • Basic logging of performance issues
  • No real-time performance dashboard
  • Standard application logging

Performance Characteristics

  • Query performance depends on DuckDB capabilities
  • Memory usage monitoring through system tools
  • No built-in performance profiling
  • Standard error logging and handling

Local Resource Utilization

  • Multi-Core Processing: Scale with additional CPU cores
  • Memory Expansion: Handle larger datasets with more RAM
  • Storage Performance: Benefit from faster SSD storage
  • Network Bandwidth: Optimize AI service usage

Single-User Application

  • Desktop application for individual users
  • Performance scales with local hardware
  • No distributed processing capabilities
  • Standard single-machine performance

Hardware Recommendations

  • CPU: Multi-core processor with high single-thread performance
  • Memory: 16GB+ RAM for large dataset processing
  • Storage: SSD storage for optimal I/O performance
  • Network: Stable internet connection for AI services

Software Optimization

  • Use Parquet files for better performance than CSV
  • Ensure adequate RAM for large datasets
  • Use SSD storage for better I/O performance
  • Close unused datasets to free memory

Efficient Workflows

  • Incremental Analysis: Build analysis incrementally
  • Reuse Results: Leverage cached results and AI responses
  • Batch Operations: Group similar operations for efficiency
  • Filter Early: Apply filters before expensive operations

Performance Monitoring

  • Track Metrics: Monitor query execution times and resource usage
  • Identify Bottlenecks: Find and address performance limitations
  • Optimize Queries: Improve slow-running queries
  • Tune Configuration: Adjust settings based on usage patterns

Scientific Method

Learn how Probably implements scientific rigor in data analysis workflows.

Large Datasets

Discover specialized techniques for handling very large datasets efficiently.