File Formats
Probably supports multiple file formats optimized for different use cases. Choose the right format for your data size and performance needs.
Supported File Formats
Section titled “Supported File Formats”CSV Files
Section titled “CSV Files”Best for: Small to medium datasets, data exchange, human-readable format
Most Common Format
- Universal compatibility with all tools
- Human-readable and easy to edit
- Works well for datasets under 100MB
- Fast loading and processing
CSV Features in Probably
- Smart type inference for columns
- Handles quoted fields and escaped characters
- UTF-8 encoding for text data
- Chunked upload for large files
Parquet Format
Section titled “Parquet Format”Best for: Large datasets, analytical workloads, cloud storage
High-Performance Choice
- 10-20x faster loading than CSV
- Columnar storage for analytical queries
- Built-in compression reduces file size
- Preserves data types and metadata
Why Choose Parquet
- Speed: Columnar format optimized for analytics
- Compression: Typically 50-80% smaller than CSV
- Type Safety: Preserves exact data types
- Schema Evolution: Supports adding/removing columns
- Cross-Platform: Works across Python, R, Spark, and more
Feather/Arrow Format
Section titled “Feather/Arrow Format”Best for: Maximum performance, memory efficiency, cross-language compatibility
Fastest Format
- Fastest possible loading speeds
- Zero-copy memory mapping
- Perfect type preservation
- Cross-language standard (Python, R, JavaScript)
Loading Local Files
Section titled “Loading Local Files”Drag and Drop Interface
Section titled “Drag and Drop Interface”Simply drag files into Probably for instant loading:
- Single Files: Drop individual files anywhere
- Multiple Files: Select and drop multiple files at once
- Folders: Drop entire folders (automatically detects compatible files)
- Compressed Files: .zip and .gz files are automatically extracted
File Picker Options
Section titled “File Picker Options”Use the file picker for more control:
- Browse to specific file locations
- Filter by file type
- Preview file contents before loading
- Batch selection of multiple files
Performance Comparison
Section titled “Performance Comparison”| Format | Load Speed | File Size | Best Use Case |
|---|---|---|---|
| CSV | Baseline | Largest | Small data, compatibility |
| Parquet | 10-20x faster | 50-80% smaller | Large data, analytics |
| Feather/Arrow | 25-50x faster | 20-40% smaller | Repeated access, speed |
What’s Next?
Section titled “What’s Next?”Database Connections
Connect to enterprise databases like Snowflake for real-time analysis.