Built-in Functions

PXL provides six core functions for data transformation and AI-powered text analysis.

Data Functions

filter()

Filter data based on conditions using comparison operators.

Syntax:

filter(condition)

Numeric Comparisons:

filter(revenue > 1000)
filter(age >= 18 and age <= 65)
filter(score != 0)
filter(rating == 5)

String Comparisons:

filter(status is "active")
filter(name is not "")
filter(product like "iPhone")
filter(description not like "discontinued")

Sequence Matching:

filter(region in ("US", "Canada", "Mexico"))
filter(status not in ("cancelled", "refunded"))
filter(priority in (1, 2, 3))

Temporal Filtering:

filter(order_date after "2024-01-01")
filter(created_at before "2024-12-31")
filter(delivery_date between "2024-01-01" and "2024-03-31")

Boolean Logic:

filter(revenue > 1000 and customer_type is "premium")
filter(status is "active" or last_login after "2024-01-01")
filter(age >= 18 and (country is "US" or country is "CA"))

ntile()

Create distribution tiles/quantiles for numerical data.

Syntax:

ntile(column_name, number_of_tiles)

Examples:

ntile(revenue, 4)    // Create quartiles
ntile(age, 10)       // Create deciles
ntile(score, 100)    // Create percentiles

Text Analysis Functions

extract()

Extract specific information from text using natural language descriptions.

Syntax:

extract("description of what to extract" from column_name)

Examples:

extract("email addresses" from customer_notes)
extract("product names" from review_text)
extract("dollar amounts" from transaction_descriptions)
extract("phone numbers" from contact_info)
extract("dates mentioned" from support_tickets)

classify()

Classify text into predefined categories.

Syntax:

classify(column_name into ("category1", "category2", "category3"))

// Or with descriptions:
classify(column_name into (
  "category1": "description of what fits this category",
  "category2": "description of what fits this category"
))

Examples:

classify(feedback_text into (
  "positive": "happy, satisfied, great experience",
  "negative": "frustrated, disappointed, poor service",
  "neutral": "factual, informational, no strong emotion"
))

classify(support_tickets into (
  "billing",
  "technical",
  "feature_request",
  "complaint"
))

classify(product_reviews into ("electronics", "clothing", "books", "home"))

score()

Create sentiment scores between two opposite concepts.

Syntax:

score(column_name from "negative_concept" to "positive_concept")

Examples:

score(review_text from "terrible" to "amazing")
score(employee_feedback from "dissatisfied" to "highly_engaged")
score(product_reviews from "hate" to "love")
score(survey_responses from "strongly_disagree" to "strongly_agree")
score(customer_calls from "angry" to "delighted")

word_count()

Count the number of words in text fields.

Syntax:

word_count(column_name)

Examples:

word_count(product_description)
word_count(customer_comments)
word_count(support_ticket_text)
word_count(review_content)

Function Chaining

Functions can be chained together using the -> operator:

filter(customer_type is "premium") ->
extract("product mentions" from feedback_text)

filter(review_date after "2024-06-01") ->
score(review_text from "disappointed" to "delighted")

filter(order_amount > 500) ->
classify(support_tickets into ("billing", "shipping", "technical"))

filter(status is "active") ->
ntile(engagement_score, 5)

Operators Reference

Comparison Operators

>, <, >=, <= - Numerical comparisons
==, != - Equality comparisons
is, is not - String equality
like, not like - String pattern matching

Set Operators

in - Check if value is in a list
not in - Check if value is not in a list
contains - Check if text contains substring

Temporal Operators

before - Date/time before specified value
after - Date/time after specified value
between - Date/time within range

Boolean Operators

and - Both conditions must be true
or - Either condition can be true

Data Types and Literals

Supported Literals

Strings: Must be quoted with double quotes "text"
Numbers: Integers and decimals 1000, 99.99, -5
Dates: ISO format "2024-01-01" or with time "2024-01-01T10:30:00"
Lists: Comma-separated values in parentheses ("value1", "value2", "value3")
Null: The literal null

Identifiers

Column names: Reference dataset columns like customer_id, order_date
Special identifier: $ refers to the current dataset

Function Performance

Optimization Tips

Apply filters early to reduce data volume
Use specific descriptions in extract() for better accuracy
Provide clear category descriptions in classify() for better results
Choose appropriate score poles that represent true opposites

Error Prevention

Common Syntax Errors

// ❌ Incorrect - missing quotes around strings
filter(status is premium)

// ✅ Correct - strings must be quoted
filter(status is "premium")

// ❌ Incorrect - wrong date format
filter(date after "01/01/2024")

// ✅ Correct - use ISO date format
filter(date after "2024-01-01")

// ❌ Incorrect - missing parentheses in function call
filter revenue > 1000

// ✅ Correct - function calls need parentheses
filter(revenue > 1000)

Function-Specific Guidelines

extract(): Be specific about what you want to extract

// ❌ Too vague
extract("information" from text)

// ✅ Specific and clear
extract("email addresses" from customer_notes)

classify(): Provide clear, distinct categories

// ❌ Overlapping categories
classify(text into ("good", "positive", "great"))

// ✅ Distinct categories
classify(text into ("positive", "negative", "neutral"))

score(): Use true opposites as poles

// ❌ Not opposites
score(text from "good" to "great")

// ✅ True opposites
score(text from "terrible" to "excellent")

What’s Next?

Advanced Techniques

Learn advanced patterns, complex filtering, and function chaining strategies.

Advanced Guide

Syntax Reference

Understand PXL’s syntax rules, operators, and language structure.

Syntax Guide