Introducing Probably

Today we are announcing Probably, the verifiable data agent.

We are also announcing that we have raised $9 million in a seed round led by Andreessen Horowitz and Accel with participation from Tokyo Black and Vermilion Cliffs Ventures.

Probably is a full stack data ingestion, transformation, analysis, and visualization engine that runs on your local machine and is powered by an internal AI agent.

Verification is a feature

This data harness is the first of its kind to feature automatic verification of analysis steps and hallucination detection over LLM outputs.

Probably reasons over your data the way a great data scientist would, and treats you as its most important source of context.

It asks sharper questions, holds real hypotheses, and never reports a number it can’t verify; it is designed to treat ambiguity, lack of context and uncertainty as first-class principles of its method, rather than assuming it will be spoon fed perfect information in a very data-imperfect world.

Performance & Privacy

It runs locally, powered by DuckDB and a highly optimized compute runtime so every interaction is blazing fast, even at billion-row scales.

Privacy and security are first-class features.

The data remains on your machine, and the LLMs only see metadata and statistics to inform the analysis plans.

Once the plans are sent back to the machine, they are validated and executed by the engine right on your hard drive.

Plot Anything

Probably features a built-from-scratch GPU-accelerated plot engine capable of rendering more data points than any engine we are aware of.

If you are running it on a WebGPU capable machine, you can pan and zoom at any granularity over 10M data points at 60 FPS.

And the best part is, you just ask for the plot you want, or any refinements.

Tabular is first class

Probably understands how to reason about tables from files and databases alike. It can query them, join them, transform them, make new ones, clean them, merge them.

You can look at a spreadsheet next to a Snowflake table next to a dump of JSON API logs.

Gone are the days of struggling to wrangle and reason across disparate data sources.

Just make a data set, add them all, and tell the agent what you want it to do.

Just the Beginning

Probably was developed in close partnership with individuals and enterprises who were willing to take a bet that AI might offer a far better way to do data.

In the beginning, LLMs could not even write SQL or create a good-looking plot over messy data to save their vectorized lives.

But now Probably can convert geometry data from ClickHouse into a scatter plot and visualize rainfall as a color-coded world map on a Cartesian plane.

Or find the best join paths through a Snowflake warehouse.

Today we are launching Probably into public preview. The version prefix is still 0.1.

There are still bugs and many hard research problems ahead, but the hill climb has made us confident that the approach is sound.

Thank you to all of our early partners, customers, investors, and especially our incredible team.

Building such a product would have been even more impossible than it already was without your support.