Skip to content

Syntax & Structure

Learn PXL’s simple but powerful syntax for creating data transformations and text analysis expressions.

PXL is a focused expression language designed for:

  • Data filtering with complex conditions
  • Text analysis using AI-powered functions
  • Data distribution analysis with quantiles
  • Function chaining for multi-step transformations
filter(revenue > 1000)
extract("email addresses" from customer_notes)
score(review_text from "bad" to "excellent")
classify(feedback into ("positive", "negative", "neutral"))
ntile(sales_amount, 4)
word_count(description)

Use the -> operator to chain function outputs:

filter(customer_type is "premium") ->
extract("product mentions" from feedback_text)
filter(order_date after "2024-01-01") ->
score(satisfaction_survey from "dissatisfied" to "satisfied")

Reference dataset columns by name:

customer_id
order_date
total_revenue
customer_type
product_name

The $ symbol refers to the current dataset:

$ // Represents the current dataset

Strings must be enclosed in double quotes:

"premium"
"2024-01-01"
"iPhone"
"email addresses"

Numbers support integers, decimals, and scientific notation:

42
123.45
-5
1.2e6

Dates use ISO format:

"2024-01-01" // Date only
"2024-01-01T12:30:00" // Date and time
true
false
TRUE // Case insensitive
FALSE
null
// Numeric comparisons
revenue > 1000
age >= 18
score < 50
rating <= 5
price == 99.99
quantity != 0
// String equality
status is "active"
category is not "discontinued"
// Pattern matching
email like "%.com"
product not like "old%"
// Membership testing
region in ("US", "Canada", "Mexico")
status not in ("cancelled", "refunded")
priority in (1, 2, 3)
// Text containment
description contains "premium"
// Date/time comparisons
created_date after "2024-01-01"
expiry_date before "2024-12-31"
event_date between "2024-01-01" and "2024-03-31"
// Logical AND
revenue > 1000 and customer_type is "premium"
// Logical OR
status is "active" or last_login after "2024-01-01"
// Complex combinations with parentheses
age >= 18 and (country is "US" or country is "CA")
filter(active == true)
filter(revenue > 10000)
filter(customer_tier is "gold")
filter(
revenue > 5000 and
customer_type is "enterprise" and
(region is "US" or region is "EU")
)
filter(
status is "active" or
(status is "trial" and signup_date after "2024-01-01")
)
filter(email is not null)
filter(phone_number is null)
filter(middle_name is not "")
extract("description" from column_name)
// Examples
extract("email addresses" from contact_info)
extract("dollar amounts" from transaction_notes)
extract("phone numbers" from customer_data)
// Simple categories
classify(text_column into ("category1", "category2", "category3"))
// Categories with descriptions
classify(feedback_text into (
"positive": "happy, satisfied, great experience",
"negative": "frustrated, disappointed, problems",
"neutral": "factual, no strong emotion"
))
score(text_column from "negative_pole" to "positive_pole")
// Examples
score(reviews from "terrible" to "amazing")
score(feedback from "angry" to "delighted")
word_count(text_column)
// Examples
word_count(description)
word_count(customer_comments)
ntile(numeric_column, number_of_tiles)
// Examples
ntile(revenue, 4) // Quartiles
ntile(age, 10) // Deciles
ntile(score, 100) // Percentiles
// Boolean logic grouping
filter((age >= 18 and age <= 65) or status is "exempt")
// Function parameters
filter(region in ("US", "CA", "MX"))
// List definitions
classify(text into ("positive", "negative", "neutral"))
  1. Parentheses () - Highest precedence
  2. Comparison operators >, <, >=, <=, ==, !=, is, like
  3. Set operators in, contains
  4. Boolean AND and
  5. Boolean OR or - Lowest precedence
// These are equivalent:
filter(age > 18 and status is "active" or type is "vip")
filter(((age > 18) and (status is "active")) or (type is "vip"))

PXL supports single-line comments:

// Filter for active premium customers
filter(status is "active" and tier is "premium")
extract("product names" from reviews) // Extract mentioned products
data -> filter(active == true) -> ntile(revenue, 4)
customer_data ->
filter(signup_date after "2024-01-01") ->
classify(feedback into ("satisfied", "neutral", "unsatisfied")) ->
filter(classify_result is "satisfied")
  • Keywords are case-insensitive: AND, and, And are equivalent
  • Column names are case-sensitive: CustomerIDcustomerid
  • String literals are case-sensitive: "Premium""premium"

Whitespace is generally ignored:

// These are equivalent
filter(revenue>1000and status is"active")
filter(revenue > 1000 and status is "active")
  • Function names are required and cannot be omitted
  • Parentheses are required for function calls
  • Double quotes are required for string literals
  • Operators must be properly spaced in some contexts
// Date range filtering
filter(order_date between "2024-01-01" and "2024-12-31")
// Multi-condition filtering
filter(amount > 100 and currency is "USD" and status is "paid")
// Category filtering
filter(product_category in ("electronics", "books", "clothing"))
// Sentiment analysis workflow
filter(review_date after "2024-01-01") ->
score(review_text from "disappointed" to "thrilled")
// Content classification
filter(content_type is "support_ticket") ->
classify(message_text into ("technical", "billing", "general"))
// Revenue quartile analysis
filter(customer_type is "enterprise") ->
ntile(annual_revenue, 4)
// Text length analysis
filter(has_description == true) ->
word_count(product_description)
// ❌ Incorrect
filter(status is active)
// ✅ Correct
filter(status is "active")
// ❌ Incorrect
filter(date after "01/15/2024")
// ✅ Correct
filter(date after "2024-01-15")
// ❌ Incorrect
filter revenue > 1000
// ✅ Correct
filter(revenue > 1000)
// ❌ Incorrect - missing closing quote
filter(status is "active)
// ❌ Incorrect - missing closing parenthesis
filter(revenue > 1000
// ✅ Correct
filter(status is "active")

Function Reference

Learn the details of each PXL function with examples and best practices.

Advanced Techniques

Master complex patterns and optimization strategies for sophisticated analysis.