Offline Jobs
Offline Jobs are long-running, asynchronous processes that operate over large datasets or perform heavy computation that isn’t suitable for synchronous API requests. These jobs typically take minutes or even hours to complete depending on data volume and job complexity.
Note: Offline Jobs Endpoints are only available upon request. If you are interested in using these endpoints, please contact us to request access:
Endpoints
The offline job flow is composed of four main endpoints:
- Initialize – This endpoint sets up the job, provisions an S3 input prefix, and issues temporary STS credentials so you can securely upload input data.
- Execute – After your data has been uploaded, this endpoint triggers the execution of the offline job using the provided inputs and configuration.
- Get Status – This endpoint allows you to query the current state of the job, including whether it is queued, running, completed, or failed. Note offline jobs are only retained in our system for 90 days after initialization. After this window you will no longer be able to view the status.
- Refresh Credentials – If the temporary credentials issued during initialization have expired—either before the input upload or when retrieving output files—this endpoint can be used to obtain new short-lived STS credentials for continued access.
Creating an Offline Job
Option 1: Initialize and Copy Script
This option includes a script that calls the initialize endpoint on your behalf, displays the full response, and uses the returned temporary credentials to copy data from your source bucket into the Foursquare input bucket.
✅ Prerequisites
Please ensure the following are installed before running the script:
1. AWS CLI
Verify installation:
aws --version2. DuckDB CLI
On macOS (recommended):
brew install duckdb
Or download from:
https://duckdb.org/docs/installation/cli
Credentials Setup
Configure an AWS CLI profile with permission to read your S3 bucket:
aws configure --profile my-customer-profile
Alternatively, you can copy your source credentials directly in the script under the SRC_AWS_*variables.
Run Script
- Copy the script below into a file titled
run_initialize_and_copy.sh. Make sure to choose the version of the script that corresponds to your job’s input format (only Parquet and CSV are supported at this time). - run
chmod +x run_initialize_and_copy.shin your terminal - run the script with
FSQ_BEARER_TOKEN='Bearer <token>' ./run_initialize_and_copy.sh s3://your-bucket/input/ my-aws-profile
#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 2 ]; then
echo "Usage: $0 <SRC_URI> <SRC_PROFILE>"
exit 1
fi
SRC_URI="$1" # Customer's input prefix
SRC_PROFILE="$2" # AWS profile for reading customer's bucket
# -------------------------------
# REQUIRE BEARER TOKEN IN ENV VAR
# -------------------------------
if [ -z "${FSQ_BEARER_TOKEN:-}" ]; then
echo "ERROR: Missing bearer token."
echo "Please set it using:"
echo " export FSQ_BEARER_TOKEN=\"your-token\""
exit 1
fi
# -------------------------------
# 1. CALL INITIALIZE ENDPOINT
# -------------------------------
echo "Calling Foursquare Initialize endpoint..."
INIT_RESPONSE=$(curl -s --location --request POST 'https://places-api.foursquare.com/offline-jobs/initialize' \
--header "Authorization: Bearer ${FSQ_BEARER_TOKEN}" \
--header 'X-Places-Api-Version: 2025-06-17')
echo ""
echo "=== Initialize Endpoint Response ==="
echo "$INIT_RESPONSE"
echo ""
# Extract values
DST_ACCESS_KEY=$(echo "$INIT_RESPONSE" | jq -r '.access_key_id')
DST_SECRET_KEY=$(echo "$INIT_RESPONSE" | jq -r '.secret_access_key')
DST_SESSION_TOKEN=$(echo "$INIT_RESPONSE" | jq -r '.session_token')
JOB_ID=$(echo "$INIT_RESPONSE" | jq -r '.job_id')
INPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.input_uri')
OUTPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.output_uri')
if [ -z "$DST_ACCESS_KEY" ] || [ "$DST_ACCESS_KEY" = "null" ]; then
echo "ERROR: Initialize endpoint did not return valid STS credentials."
exit 1
fi
echo "Job ID: $JOB_ID"
echo "Input URI: $INPUT_URI"
echo "Output URI: $OUTPUT_URI"
echo ""
# -------------------------------
# 2. FETCH SOURCE CREDS FROM PROFILE
# -------------------------------
echo "Fetching AWS source credentials from profile '$SRC_PROFILE'..."
SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")
if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
echo "ERROR: Missing source AWS credentials for profile '$SRC_PROFILE'"
exit 1
fi
# -------------------------------
# 3. RUN DUCKDB COPY
# -------------------------------
echo "Running DuckDB S3→S3 copy as multiple ~128MB files..."
duckdb <<EOF
SET s3_region='us-east-1';
-- Source creds (customer bucket)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';
INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;
-- Destination creds (from initialize endpoint)
CREATE SECRET fsq_dst (
TYPE S3,
KEY_ID '${DST_ACCESS_KEY}',
SECRET '${DST_SECRET_KEY}',
SESSION_TOKEN '${DST_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${INPUT_URI}'
);
-- Source secret
CREATE SECRET src (
TYPE S3,
KEY_ID '${SRC_ACCESS_KEY}',
SECRET '${SRC_SECRET_KEY}',
SESSION_TOKEN '${SRC_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${SRC_URI}'
);
-- Write multiple files, each ~128MB
COPY (
SELECT *
FROM read_parquet('${SRC_URI}*.parquet')
) TO '${INPUT_URI}' (
FORMAT PARQUET,
PER_THREAD_OUTPUT FALSE,
FILE_SIZE_BYTES 134217728 -- 128 MB (128 * 1024 * 1024)
);
EOF
echo ""
echo "Copy completed!"
echo "Your job_id is: $JOB_ID"
echo "Your input files are located under: ${INPUT_URI}/"
echo "Your job output will later appear at: ${OUTPUT_URI}"#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 2 ]; then
echo "Usage: $0 <SRC_URI> <SRC_PROFILE>"
exit 1
fi
SRC_URI="$1" # Customer's input prefix
SRC_PROFILE="$2" # AWS profile for reading customer's bucket
# -------------------------------
# REQUIRE BEARER TOKEN IN ENV VAR
# -------------------------------
if [ -z "${FSQ_BEARER_TOKEN:-}" ]; then
echo "ERROR: Missing bearer token."
echo "Please set it using:"
echo " export FSQ_BEARER_TOKEN=\"your-token\""
exit 1
fi
# -------------------------------
# 1. CALL INITIALIZE ENDPOINT
# -------------------------------
echo "Calling Foursquare Initialize endpoint..."
INIT_RESPONSE=$(curl -s --location --request POST 'https://places-api.foursquare.com/offline-jobs/initialize' \
--header "Authorization: Bearer ${FSQ_BEARER_TOKEN}" \
--header 'X-Places-Api-Version: 2025-06-17')
echo ""
echo "=== Initialize Endpoint Response ==="
echo "$INIT_RESPONSE"
echo ""
# Extract values
DST_ACCESS_KEY=$(echo "$INIT_RESPONSE" | jq -r '.access_key_id')
DST_SECRET_KEY=$(echo "$INIT_RESPONSE" | jq -r '.secret_access_key')
DST_SESSION_TOKEN=$(echo "$INIT_RESPONSE" | jq -r '.session_token')
JOB_ID=$(echo "$INIT_RESPONSE" | jq -r '.job_id')
INPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.input_uri')
OUTPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.output_uri')
if [ -z "$DST_ACCESS_KEY" ] || [ "$DST_ACCESS_KEY" = "null" ]; then
echo "ERROR: Initialize endpoint did not return valid STS credentials."
exit 1
fi
echo "Job ID: $JOB_ID"
echo "Input URI: $INPUT_URI"
echo "Output URI: $OUTPUT_URI"
echo ""
# -------------------------------
# 2. FETCH SOURCE CREDS FROM PROFILE
# -------------------------------
echo "Fetching AWS source credentials from profile '$SRC_PROFILE'..."
SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")
if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
echo "ERROR: Missing source AWS credentials for profile '$SRC_PROFILE'"
exit 1
fi
# -------------------------------
# 3. RUN DUCKDB COPY
# -------------------------------
echo "Running DuckDB S3→S3 copy as multiple ~128MB files..."
duckdb <<EOF
SET s3_region='us-east-1';
-- Source creds (customer bucket)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';
INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;
-- Destination creds (from initialize endpoint)
CREATE SECRET fsq_dst (
TYPE S3,
KEY_ID '${DST_ACCESS_KEY}',
SECRET '${DST_SECRET_KEY}',
SESSION_TOKEN '${DST_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${INPUT_URI}'
);
-- Source secret
CREATE SECRET src (
TYPE S3,
KEY_ID '${SRC_ACCESS_KEY}',
SECRET '${SRC_SECRET_KEY}',
SESSION_TOKEN '${SRC_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${SRC_URI}'
);
-- Write multiple files, each ~128MB
COPY (
SELECT *
FROM read_csv('${SRC_URI}*.csv', AUTO_DETECT = TRUE, UNION_BY_NAME = TRUE)
) TO '${INPUT_URI}' (
FORMAT CSV,
HEADER TRUE,
PER_THREAD_OUTPUT FALSE,
FILE_SIZE_BYTES 134217728 -- 128 MB (128 * 1024 * 1024)
);
EOF
echo ""
echo "Copy completed!"
echo "Your job_id is: $JOB_ID"
echo "Your input files are located under: ${INPUT_URI}/"
echo "Your job output will later appear at: ${OUTPUT_URI}"Option 2: Manual
- Call the initialize endpoint to perform the initial job setup. Make sure you save the data in the response as this will be needed to upload your dataset, trigger execution, and check on the job status.
- Upload the desired input data using the
input_uri,access_key_id,secret_access_keyandsession_tokenfields from step 1. See Writing input files to S3 using STS credentials for examples. Alternatively, you may wish to copy the data directly in a Spark job or something similar.
Writing Offline Job Input Files to S3 Using STS Credentials
If your input data is already in an internal S3 bucket, you can use the provided shell script to copy your data over to the Foursquare-owned S3 bucket. This script will also handle partitioning your data for optimal usage in the offline job. It uses two different sets of AWS credentials:
- Source credentials → loaded from your AWS CLI profile
- Destination credentials → temporary STS credentials provided by the
initializeendpoint and set as environment variables
- First follow the Prerequisites and Credentials Setup section from Option 1 .
- You must also export the Foursquare provided STS credentials as environment variables before running the script:
export DST_AWS_ACCESS_KEY="YOUR_TEMP_ACCESS_KEY"
export DST_AWS_SECRET_KEY="YOUR_TEMP_SECRET_KEY"
export DST_AWS_SESSION_TOKEN="YOUR_TEMP_SESSION_TOKEN"
Run the script
- Copy the below script into a file titled
copy.sh. Make sure to choose the version of the script that corresponds to your job’s input format (only Parquet and CSV are supported at this time). - run
chmod +x copy.shin your terminal - Run the script via
./copy.sh s3://your-bucket/input/ s3://fsq-offline-jobs/customer123/input/jobId123 my-aws-profile
#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 3 ]; then
echo "Usage: $0 <SRC_URI> <DST_URI> <SRC_PROFILE>"
exit 1
fi
SRC_URI="$1" # e.g. s3://customer-bucket/prefix/
DST_URI="$2" # e.g. s3://fsq-offline-jobs/output/
SRC_PROFILE="$3" # profile for reading the customer's bucket
echo "Fetching AWS credentials..."
# ------- SOURCE CREDS FROM PROFILE -------
SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")
if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
echo "Missing source AWS credentials for profile '$SRC_PROFILE'"
exit 1
fi
# ------- DESTINATION CREDS FROM ENV VARS -------
DST_ACCESS_KEY="${DST_AWS_ACCESS_KEY:-}"
DST_SECRET_KEY="${DST_AWS_SECRET_KEY:-}"
DST_SESSION_TOKEN="${DST_AWS_SESSION_TOKEN:-}"
if [ -z "$DST_ACCESS_KEY" ] || [ -z "$DST_SECRET_KEY" ]; then
echo "Missing destination AWS credentials."
echo "Expected environment variables: DST_ACCESS_KEY, DST_SECRET_KEY, DST_SESSION_TOKEN"
exit 1
fi
echo "Running DuckDB S3→S3 copy..."
duckdb <<EOF
SET s3_region='us-east-1';
-- Source creds (from profile)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';
INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;
-- DST creds (DuckDB secret)
CREATE SECRET fsq_dst (
TYPE S3,
KEY_ID '${DST_ACCESS_KEY}',
SECRET '${DST_SECRET_KEY}',
SESSION_TOKEN '${DST_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${DST_URI}'
);
CREATE SECRET src (
TYPE S3,
KEY_ID '${SRC_ACCESS_KEY}',
SECRET '${SRC_SECRET_KEY}',
SESSION_TOKEN '${SRC_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${SRC_URI}'
);
COPY (
SELECT * FROM read_parquet('${SRC_URI}*.parquet')
) TO '${DST_URI}' (
FORMAT PARQUET,
PER_THREAD_OUTPUT FALSE,
FILE_SIZE_BYTES 134217728 -- 128 MB (128 * 1024 * 1024)
);
EOF
echo "Copy completed!"
echo "Your input file is located at: ${DST_URI}/combined_input.parquet"
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("OfflineJobExample").getOrCreate()
access_key_id = "<access_key_id>"
secret_access_key = "<secret_access_key>"
session_token = "<session_token>"
input_path = "s3a://4sq-offline-jobs/input/abcd1234/" # <inputUri>
hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.access.key", access_key_id)
hadoop_conf.set("fs.s3a.secret.key", secret_access_key)
hadoop_conf.set("fs.s3a.session.token", session_token)
hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
df = <Your logic to create a parquet dataframe>
spark.write.parquet(input_path)
#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 3 ]; then
echo "Usage: $0 <SRC_URI> <DST_URI> <SRC_PROFILE>"
exit 1
fi
SRC_URI="$1" # e.g. s3://customer-bucket/prefix/
DST_URI="$2" # e.g. s3://fsq-offline-jobs/output/
SRC_PROFILE="$3" # profile for reading the customer's bucket
echo "Fetching AWS credentials..."
# ------- SOURCE CREDS FROM PROFILE -------
SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")
if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
echo "Missing source AWS credentials for profile '$SRC_PROFILE'"
exit 1
fi
# ------- DESTINATION CREDS FROM ENV VARS -------
DST_ACCESS_KEY="${DST_AWS_ACCESS_KEY:-}"
DST_SECRET_KEY="${DST_AWS_SECRET_KEY:-}"
DST_SESSION_TOKEN="${DST_AWS_SESSION_TOKEN:-}"
if [ -z "$DST_ACCESS_KEY" ] || [ -z "$DST_SECRET_KEY" ]; then
echo "Missing destination AWS credentials."
echo "Expected environment variables: DST_ACCESS_KEY, DST_SECRET_KEY, DST_SESSION_TOKEN"
exit 1
fi
echo "Running DuckDB S3→S3 copy..."
duckdb <<EOF
SET s3_region='us-east-1';
-- Source creds (from profile)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';
INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;
-- DST creds (DuckDB secret)
CREATE SECRET fsq_dst (
TYPE S3,
KEY_ID '${DST_ACCESS_KEY}',
SECRET '${DST_SECRET_KEY}',
SESSION_TOKEN '${DST_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${DST_URI}'
);
CREATE SECRET src (
TYPE S3,
KEY_ID '${SRC_ACCESS_KEY}',
SECRET '${SRC_SECRET_KEY}',
SESSION_TOKEN '${SRC_SESSION_TOKEN}',
REGION 'us-east-1',
SCOPE '${SRC_URI}'
);
COPY (
SELECT * FROM read_csv('${SRC_URI}*.csv', AUTO_DETECT = TRUE, UNION_BY_NAME = TRUE)
) TO '${DST_URI}' (
FORMAT CSV,
HEADER TRUE,
PER_THREAD_OUTPUT FALSE,
FILE_SIZE_BYTES 134217728 -- 128 MB
);
EOF
echo "Copy completed!"
echo "Your input file is located at: ${DST_URI}"
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("OfflineJobExample").getOrCreate()
access_key_id = "<access_key_id>"
secret_access_key = "<secret_access_key>"
session_token = "<session_token>"
input_path = "s3a://4sq-offline-jobs/input/abcd1234/" # <inputUri>
hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.access.key", access_key_id)
hadoop_conf.set("fs.s3a.secret.key", secret_access_key)
hadoop_conf.set("fs.s3a.session.token", session_token)
hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
df = <Your logic to create a dataframe>
(df.write
.mode("overwrite")
.option("header", True)
.option("quoteAll", True)
.csv(input_path))Executing an Offline Job
Using the job_id returned from the initialize endpoint, call the execute endpoint. This will execute the actual job. If the job executed successfully, you should see the job status change to running right after hitting the execute endpoint.
Checking the Status of an Offline Job
Call the Get Status endpoint using the job_idprovided from the initialize endpoint. If the job_idwas untracked, you can call the get status endpoint with no arguments to view all of your jobs.
Reading the Output Files
- If the status of your job is complete, you can get new AWS credentials by calling the refresh credentials endpoint.
- Note: These jobs typically take around 45 minutes
- Using these credentials, you can view the results:
export AWS_ACCESS_KEY_ID="<accessKeyId>" && \
export AWS_SECRET_ACCESS_KEY="<secretAccessKey>" && \
export AWS_SESSION_TOKEN="<sessionToken>" && \
aws s3 cp <output dir> ./output/ --recursiveimport org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.appName("ReadOfflineJobOutput")
.getOrCreate()
// Temporary credentials from API response
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "<accessKeyId>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "<secretAccessKey>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", "<sessionToken>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
val outputUri = "s3a://4sq-offline-jobs/output/abcd1234/"
// Read the output as a DataFrame
val resultDf = spark.read.parquet(outputUri)
// Example: show or save locally
resultDf.show(10)from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("ReadOfflineJobOutput").getOrCreate()
hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.access.key", "<accessKeyId>")
hadoop_conf.set("fs.s3a.secret.key", "<secretAccessKey>")
hadoop_conf.set("fs.s3a.session.token", "<sessionToken>")
hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
output_uri = "s3a://4sq-offline-jobs/output/abcd1234/"
df = spark.read.parquet(output_uri)
df.show(10)