Skip to content

Built-in Functions

PXL provides six core functions for data transformation and AI-powered text analysis.

Filter data based on conditions using comparison operators.

Syntax:

filter(condition)

Numeric Comparisons:

filter(revenue > 1000)
filter(age >= 18 and age <= 65)
filter(score != 0)
filter(rating == 5)

String Comparisons:

filter(status is "active")
filter(name is not "")
filter(product like "iPhone")
filter(description not like "discontinued")

Sequence Matching:

filter(region in ("US", "Canada", "Mexico"))
filter(status not in ("cancelled", "refunded"))
filter(priority in (1, 2, 3))

Temporal Filtering:

filter(order_date after "2024-01-01")
filter(created_at before "2024-12-31")
filter(delivery_date between "2024-01-01" and "2024-03-31")

Boolean Logic:

filter(revenue > 1000 and customer_type is "premium")
filter(status is "active" or last_login after "2024-01-01")
filter(age >= 18 and (country is "US" or country is "CA"))

Create distribution tiles/quantiles for numerical data.

Syntax:

ntile(column_name, number_of_tiles)

Examples:

ntile(revenue, 4) // Create quartiles
ntile(age, 10) // Create deciles
ntile(score, 100) // Create percentiles

Extract specific information from text using natural language descriptions.

Syntax:

extract("description of what to extract" from column_name)

Examples:

extract("email addresses" from customer_notes)
extract("product names" from review_text)
extract("dollar amounts" from transaction_descriptions)
extract("phone numbers" from contact_info)
extract("dates mentioned" from support_tickets)

Classify text into predefined categories.

Syntax:

classify(column_name into ("category1", "category2", "category3"))
// Or with descriptions:
classify(column_name into (
"category1": "description of what fits this category",
"category2": "description of what fits this category"
))

Examples:

classify(feedback_text into (
"positive": "happy, satisfied, great experience",
"negative": "frustrated, disappointed, poor service",
"neutral": "factual, informational, no strong emotion"
))
classify(support_tickets into (
"billing",
"technical",
"feature_request",
"complaint"
))
classify(product_reviews into ("electronics", "clothing", "books", "home"))

Create sentiment scores between two opposite concepts.

Syntax:

score(column_name from "negative_concept" to "positive_concept")

Examples:

score(review_text from "terrible" to "amazing")
score(employee_feedback from "dissatisfied" to "highly_engaged")
score(product_reviews from "hate" to "love")
score(survey_responses from "strongly_disagree" to "strongly_agree")
score(customer_calls from "angry" to "delighted")

Count the number of words in text fields.

Syntax:

word_count(column_name)

Examples:

word_count(product_description)
word_count(customer_comments)
word_count(support_ticket_text)
word_count(review_content)

Functions can be chained together using the -> operator:

filter(customer_type is "premium") ->
extract("product mentions" from feedback_text)
filter(review_date after "2024-06-01") ->
score(review_text from "disappointed" to "delighted")
filter(order_amount > 500) ->
classify(support_tickets into ("billing", "shipping", "technical"))
filter(status is "active") ->
ntile(engagement_score, 5)
  • >, <, >=, <= - Numerical comparisons
  • ==, != - Equality comparisons
  • is, is not - String equality
  • like, not like - String pattern matching
  • in - Check if value is in a list
  • not in - Check if value is not in a list
  • contains - Check if text contains substring
  • before - Date/time before specified value
  • after - Date/time after specified value
  • between - Date/time within range
  • and - Both conditions must be true
  • or - Either condition can be true
  • Strings: Must be quoted with double quotes "text"
  • Numbers: Integers and decimals 1000, 99.99, -5
  • Dates: ISO format "2024-01-01" or with time "2024-01-01T10:30:00"
  • Lists: Comma-separated values in parentheses ("value1", "value2", "value3")
  • Null: The literal null
  • Column names: Reference dataset columns like customer_id, order_date
  • Special identifier: $ refers to the current dataset
  1. Apply filters early to reduce data volume
  2. Use specific descriptions in extract() for better accuracy
  3. Provide clear category descriptions in classify() for better results
  4. Choose appropriate score poles that represent true opposites
// ❌ Incorrect - missing quotes around strings
filter(status is premium)
// ✅ Correct - strings must be quoted
filter(status is "premium")
// ❌ Incorrect - wrong date format
filter(date after "01/01/2024")
// ✅ Correct - use ISO date format
filter(date after "2024-01-01")
// ❌ Incorrect - missing parentheses in function call
filter revenue > 1000
// ✅ Correct - function calls need parentheses
filter(revenue > 1000)

extract(): Be specific about what you want to extract

// ❌ Too vague
extract("information" from text)
// ✅ Specific and clear
extract("email addresses" from customer_notes)

classify(): Provide clear, distinct categories

// ❌ Overlapping categories
classify(text into ("good", "positive", "great"))
// ✅ Distinct categories
classify(text into ("positive", "negative", "neutral"))

score(): Use true opposites as poles

// ❌ Not opposites
score(text from "good" to "great")
// ✅ True opposites
score(text from "terrible" to "excellent")

Advanced Techniques

Learn advanced patterns, complex filtering, and function chaining strategies.

Syntax Reference

Understand PXL’s syntax rules, operators, and language structure.