Offline Jobs Onboarding Guide

Offline Jobs

Offline Jobs are long-running, asynchronous processes that operate over large datasets or perform heavy computation that isn’t suitable for synchronous API requests. These jobs typically take minutes or even hours to complete depending on data volume and job complexity.

Note: Offline Jobs Endpoints are only available upon request. If you are interested in using these endpoints, please contact us to request access:


Endpoints

The offline job flow is composed of four main endpoints:

  1. Initialize – This endpoint sets up the job, provisions an S3 input prefix, and issues temporary STS credentials so you can securely upload input data.
  2. Execute – After your data has been uploaded, this endpoint triggers the execution of the offline job using the provided inputs and configuration.
  3. Get Status – This endpoint allows you to query the current state of the job, including whether it is queued, running, completed, or failed. Note offline jobs are only retained in our system for 90 days after initialization. After this window you will no longer be able to view the status.
  4. Refresh Credentials – If the temporary credentials issued during initialization have expired—either before the input upload or when retrieving output files—this endpoint can be used to obtain new short-lived STS credentials for continued access.

Creating an Offline Job

Option 1: Initialize and Copy Script

This option includes a script that calls the initialize endpoint on your behalf, displays the full response, and uses the returned temporary credentials to copy data from your source bucket into the Foursquare input bucket.

✅ Prerequisites

Please ensure the following are installed before running the script:

1. AWS CLI

Verify installation:

aws --version

2. DuckDB CLI

On macOS (recommended):

brew install duckdb

Or download from:
https://duckdb.org/docs/installation/cli

Credentials Setup

Configure an AWS CLI profile with permission to read your S3 bucket:

aws configure --profile my-customer-profile

Alternatively, you can copy your source credentials directly in the script under the SRC_AWS_*variables.

Run Script

  1. Copy the script below into a file titled run_initialize_and_copy.sh. Make sure to choose the version of the script that corresponds to your job’s input format (only Parquet and CSV are supported at this time).
  2. run chmod +x run_initialize_and_copy.sh in your terminal
  3. run the script withFSQ_BEARER_TOKEN='Bearer <token>' ./run_initialize_and_copy.sh s3://your-bucket/input/ my-aws-profile
#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -lt 2 ]; then
    echo "Usage: $0 <SRC_URI> <SRC_PROFILE>"
    exit 1
fi

SRC_URI="$1"          # Customer's input prefix
SRC_PROFILE="$2"      # AWS profile for reading customer's bucket

# -------------------------------
# REQUIRE BEARER TOKEN IN ENV VAR
# -------------------------------
if [ -z "${FSQ_BEARER_TOKEN:-}" ]; then
    echo "ERROR: Missing bearer token."
    echo "Please set it using:"
    echo "  export FSQ_BEARER_TOKEN=\"your-token\""
    exit 1
fi

# -------------------------------
# 1. CALL INITIALIZE ENDPOINT
# -------------------------------
echo "Calling Foursquare Initialize endpoint..."
INIT_RESPONSE=$(curl -s --location --request POST 'https://places-api.foursquare.com/offline-jobs/initialize' \
  --header "Authorization: Bearer ${FSQ_BEARER_TOKEN}" \
  --header 'X-Places-Api-Version: 2025-06-17')

echo ""
echo "=== Initialize Endpoint Response ==="
echo "$INIT_RESPONSE"
echo ""

# Extract values
DST_ACCESS_KEY=$(echo "$INIT_RESPONSE" | jq -r '.access_key_id')
DST_SECRET_KEY=$(echo "$INIT_RESPONSE" | jq -r '.secret_access_key')
DST_SESSION_TOKEN=$(echo "$INIT_RESPONSE" | jq -r '.session_token')
JOB_ID=$(echo "$INIT_RESPONSE" | jq -r '.job_id')
INPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.input_uri')
OUTPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.output_uri')

if [ -z "$DST_ACCESS_KEY" ] || [ "$DST_ACCESS_KEY" = "null" ]; then
    echo "ERROR: Initialize endpoint did not return valid STS credentials."
    exit 1
fi

echo "Job ID:       $JOB_ID"
echo "Input URI:    $INPUT_URI"
echo "Output URI:   $OUTPUT_URI"
echo ""

# -------------------------------
# 2. FETCH SOURCE CREDS FROM PROFILE
# -------------------------------
echo "Fetching AWS source credentials from profile '$SRC_PROFILE'..."

SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")

if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
    echo "ERROR: Missing source AWS credentials for profile '$SRC_PROFILE'"
    exit 1
fi

# -------------------------------
# 3. RUN DUCKDB COPY
# -------------------------------
echo "Running DuckDB S3→S3 copy as multiple ~128MB files..."

duckdb <<EOF
SET s3_region='us-east-1';

-- Source creds (customer bucket)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';

INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;

-- Destination creds (from initialize endpoint)
CREATE SECRET fsq_dst (
    TYPE S3,
    KEY_ID '${DST_ACCESS_KEY}',
    SECRET '${DST_SECRET_KEY}',
    SESSION_TOKEN '${DST_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${INPUT_URI}'
);

-- Source secret
CREATE SECRET src (
    TYPE S3,
    KEY_ID '${SRC_ACCESS_KEY}',
    SECRET '${SRC_SECRET_KEY}',
    SESSION_TOKEN '${SRC_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${SRC_URI}'
);

-- Write multiple files, each ~128MB
COPY (
    SELECT *
    FROM read_parquet('${SRC_URI}*.parquet')
) TO '${INPUT_URI}' (
    FORMAT PARQUET,
    PER_THREAD_OUTPUT FALSE,
    FILE_SIZE_BYTES 134217728   -- 128 MB (128 * 1024 * 1024)
);
EOF

echo ""
echo "Copy completed!"
echo "Your job_id is: $JOB_ID"
echo "Your input files are located under:  ${INPUT_URI}/"
echo "Your job output will later appear at: ${OUTPUT_URI}"
#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -lt 2 ]; then
    echo "Usage: $0 <SRC_URI> <SRC_PROFILE>"
    exit 1
fi

SRC_URI="$1"          # Customer's input prefix
SRC_PROFILE="$2"      # AWS profile for reading customer's bucket

# -------------------------------
# REQUIRE BEARER TOKEN IN ENV VAR
# -------------------------------
if [ -z "${FSQ_BEARER_TOKEN:-}" ]; then
    echo "ERROR: Missing bearer token."
    echo "Please set it using:"
    echo "  export FSQ_BEARER_TOKEN=\"your-token\""
    exit 1
fi

# -------------------------------
# 1. CALL INITIALIZE ENDPOINT
# -------------------------------
echo "Calling Foursquare Initialize endpoint..."
INIT_RESPONSE=$(curl -s --location --request POST 'https://places-api.foursquare.com/offline-jobs/initialize' \
  --header "Authorization: Bearer ${FSQ_BEARER_TOKEN}" \
  --header 'X-Places-Api-Version: 2025-06-17')

echo ""
echo "=== Initialize Endpoint Response ==="
echo "$INIT_RESPONSE"
echo ""

# Extract values
DST_ACCESS_KEY=$(echo "$INIT_RESPONSE" | jq -r '.access_key_id')
DST_SECRET_KEY=$(echo "$INIT_RESPONSE" | jq -r '.secret_access_key')
DST_SESSION_TOKEN=$(echo "$INIT_RESPONSE" | jq -r '.session_token')
JOB_ID=$(echo "$INIT_RESPONSE" | jq -r '.job_id')
INPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.input_uri')
OUTPUT_URI=$(echo "$INIT_RESPONSE" | jq -r '.output_uri')

if [ -z "$DST_ACCESS_KEY" ] || [ "$DST_ACCESS_KEY" = "null" ]; then
    echo "ERROR: Initialize endpoint did not return valid STS credentials."
    exit 1
fi

echo "Job ID:       $JOB_ID"
echo "Input URI:    $INPUT_URI"
echo "Output URI:   $OUTPUT_URI"
echo ""

# -------------------------------
# 2. FETCH SOURCE CREDS FROM PROFILE
# -------------------------------
echo "Fetching AWS source credentials from profile '$SRC_PROFILE'..."

SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")

if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
    echo "ERROR: Missing source AWS credentials for profile '$SRC_PROFILE'"
    exit 1
fi

# -------------------------------
# 3. RUN DUCKDB COPY
# -------------------------------
echo "Running DuckDB S3→S3 copy as multiple ~128MB files..."

duckdb <<EOF
SET s3_region='us-east-1';

-- Source creds (customer bucket)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';

INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;

-- Destination creds (from initialize endpoint)
CREATE SECRET fsq_dst (
    TYPE S3,
    KEY_ID '${DST_ACCESS_KEY}',
    SECRET '${DST_SECRET_KEY}',
    SESSION_TOKEN '${DST_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${INPUT_URI}'
);

-- Source secret
CREATE SECRET src (
    TYPE S3,
    KEY_ID '${SRC_ACCESS_KEY}',
    SECRET '${SRC_SECRET_KEY}',
    SESSION_TOKEN '${SRC_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${SRC_URI}'
);

-- Write multiple files, each ~128MB
COPY (
    SELECT *
    FROM read_csv('${SRC_URI}*.csv', AUTO_DETECT = TRUE, UNION_BY_NAME = TRUE)
) TO '${INPUT_URI}' (
    FORMAT CSV,
    HEADER TRUE,
    PER_THREAD_OUTPUT FALSE,
    FILE_SIZE_BYTES 134217728   -- 128 MB (128 * 1024 * 1024)
);
EOF

echo ""
echo "Copy completed!"
echo "Your job_id is: $JOB_ID"
echo "Your input files are located under:  ${INPUT_URI}/"
echo "Your job output will later appear at: ${OUTPUT_URI}"

Option 2: Manual

  1. Call the initialize endpoint to perform the initial job setup. Make sure you save the data in the response as this will be needed to upload your dataset, trigger execution, and check on the job status.
  2. Upload the desired input data using the input_uri, access_key_id, secret_access_key andsession_token fields from step 1. See Writing input files to S3 using STS credentials for examples. Alternatively, you may wish to copy the data directly in a Spark job or something similar.

Writing Offline Job Input Files to S3 Using STS Credentials

If your input data is already in an internal S3 bucket, you can use the provided shell script to copy your data over to the Foursquare-owned S3 bucket. This script will also handle partitioning your data for optimal usage in the offline job. It uses two different sets of AWS credentials:

  • Source credentials → loaded from your AWS CLI profile
  • Destination credentials → temporary STS credentials provided by the initializeendpoint and set as environment variables
  1. First follow the Prerequisites and Credentials Setup section from Option 1 .
  2. You must also export the Foursquare provided STS credentials as environment variables before running the script:
export DST_AWS_ACCESS_KEY="YOUR_TEMP_ACCESS_KEY"
export DST_AWS_SECRET_KEY="YOUR_TEMP_SECRET_KEY"
export DST_AWS_SESSION_TOKEN="YOUR_TEMP_SESSION_TOKEN"

Run the script

  1. Copy the below script into a file titled copy.sh. Make sure to choose the version of the script that corresponds to your job’s input format (only Parquet and CSV are supported at this time).
  2. run chmod +x copy.sh in your terminal
  3. Run the script via ./copy.sh s3://your-bucket/input/ s3://fsq-offline-jobs/customer123/input/jobId123 my-aws-profile
#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -lt 3 ]; then
    echo "Usage: $0 <SRC_URI> <DST_URI> <SRC_PROFILE>"
    exit 1
fi

SRC_URI="$1"       # e.g. s3://customer-bucket/prefix/
DST_URI="$2"       # e.g. s3://fsq-offline-jobs/output/
SRC_PROFILE="$3"   # profile for reading the customer's bucket

echo "Fetching AWS credentials..."

# ------- SOURCE CREDS FROM PROFILE -------
SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")

if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
    echo "Missing source AWS credentials for profile '$SRC_PROFILE'"
    exit 1
fi

# ------- DESTINATION CREDS FROM ENV VARS -------
DST_ACCESS_KEY="${DST_AWS_ACCESS_KEY:-}"
DST_SECRET_KEY="${DST_AWS_SECRET_KEY:-}"
DST_SESSION_TOKEN="${DST_AWS_SESSION_TOKEN:-}"

if [ -z "$DST_ACCESS_KEY" ] || [ -z "$DST_SECRET_KEY" ]; then
    echo "Missing destination AWS credentials."
    echo "Expected environment variables: DST_ACCESS_KEY, DST_SECRET_KEY, DST_SESSION_TOKEN"
    exit 1
fi

echo "Running DuckDB S3→S3 copy..."

duckdb <<EOF
SET s3_region='us-east-1';

-- Source creds (from profile)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';

INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;

-- DST creds (DuckDB secret)
CREATE SECRET fsq_dst (
    TYPE S3,
    KEY_ID '${DST_ACCESS_KEY}',
    SECRET '${DST_SECRET_KEY}',
    SESSION_TOKEN '${DST_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${DST_URI}'
);

CREATE SECRET src (
    TYPE S3,
    KEY_ID '${SRC_ACCESS_KEY}',
    SECRET '${SRC_SECRET_KEY}',
    SESSION_TOKEN '${SRC_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${SRC_URI}'
);

COPY (
    SELECT * FROM read_parquet('${SRC_URI}*.parquet')
) TO '${DST_URI}' (
    FORMAT PARQUET,
    PER_THREAD_OUTPUT FALSE,
    FILE_SIZE_BYTES 134217728   -- 128 MB (128 * 1024 * 1024)
);
EOF

echo "Copy completed!"
echo "Your input file is located at: ${DST_URI}/combined_input.parquet"

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("OfflineJobExample").getOrCreate()

access_key_id = "<access_key_id>"
secret_access_key = "<secret_access_key>"
session_token = "<session_token>"
input_path = "s3a://4sq-offline-jobs/input/abcd1234/" # <inputUri> 

hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.access.key", access_key_id)
hadoop_conf.set("fs.s3a.secret.key", secret_access_key)
hadoop_conf.set("fs.s3a.session.token", session_token)
hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")


df = <Your logic to create a parquet dataframe>

spark.write.parquet(input_path)
#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -lt 3 ]; then
    echo "Usage: $0 <SRC_URI> <DST_URI> <SRC_PROFILE>"
    exit 1
fi

SRC_URI="$1"       # e.g. s3://customer-bucket/prefix/
DST_URI="$2"       # e.g. s3://fsq-offline-jobs/output/
SRC_PROFILE="$3"   # profile for reading the customer's bucket

echo "Fetching AWS credentials..."

# ------- SOURCE CREDS FROM PROFILE -------
SRC_ACCESS_KEY=$(aws configure get aws_access_key_id --profile "$SRC_PROFILE")
SRC_SECRET_KEY=$(aws configure get aws_secret_access_key --profile "$SRC_PROFILE")
SRC_SESSION_TOKEN=$(aws configure get aws_session_token --profile "$SRC_PROFILE")

if [ -z "$SRC_ACCESS_KEY" ] || [ -z "$SRC_SECRET_KEY" ]; then
    echo "Missing source AWS credentials for profile '$SRC_PROFILE'"
    exit 1
fi

# ------- DESTINATION CREDS FROM ENV VARS -------
DST_ACCESS_KEY="${DST_AWS_ACCESS_KEY:-}"
DST_SECRET_KEY="${DST_AWS_SECRET_KEY:-}"
DST_SESSION_TOKEN="${DST_AWS_SESSION_TOKEN:-}"

if [ -z "$DST_ACCESS_KEY" ] || [ -z "$DST_SECRET_KEY" ]; then
    echo "Missing destination AWS credentials."
    echo "Expected environment variables: DST_ACCESS_KEY, DST_SECRET_KEY, DST_SESSION_TOKEN"
    exit 1
fi

echo "Running DuckDB S3→S3 copy..."

duckdb <<EOF
SET s3_region='us-east-1';

-- Source creds (from profile)
SET s3_access_key_id='${SRC_ACCESS_KEY}';
SET s3_secret_access_key='${SRC_SECRET_KEY}';
SET s3_session_token='${SRC_SESSION_TOKEN}';

INSTALL httpfs;
LOAD httpfs;
INSTALL aws;
LOAD aws;

-- DST creds (DuckDB secret)
CREATE SECRET fsq_dst (
    TYPE S3,
    KEY_ID '${DST_ACCESS_KEY}',
    SECRET '${DST_SECRET_KEY}',
    SESSION_TOKEN '${DST_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${DST_URI}'
);

CREATE SECRET src (
    TYPE S3,
    KEY_ID '${SRC_ACCESS_KEY}',
    SECRET '${SRC_SECRET_KEY}',
    SESSION_TOKEN '${SRC_SESSION_TOKEN}',
    REGION 'us-east-1',
    SCOPE '${SRC_URI}'
);

  COPY (
    SELECT * FROM read_csv('${SRC_URI}*.csv', AUTO_DETECT = TRUE, UNION_BY_NAME = TRUE)
 ) TO '${DST_URI}' (
    FORMAT CSV,
    HEADER TRUE,
    PER_THREAD_OUTPUT FALSE,
    FILE_SIZE_BYTES 134217728   -- 128 MB
);
EOF

echo "Copy completed!"
echo "Your input file is located at: ${DST_URI}"
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("OfflineJobExample").getOrCreate()

access_key_id = "<access_key_id>"
secret_access_key = "<secret_access_key>"
session_token = "<session_token>"
input_path = "s3a://4sq-offline-jobs/input/abcd1234/" # <inputUri> 

hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.access.key", access_key_id)
hadoop_conf.set("fs.s3a.secret.key", secret_access_key)
hadoop_conf.set("fs.s3a.session.token", session_token)
hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")


df = <Your logic to create a dataframe>

(df.write
  .mode("overwrite")
  .option("header", True)
  .option("quoteAll", True)
  .csv(input_path))

Executing an Offline Job

Using the job_id returned from the initialize endpoint, call the execute endpoint. This will execute the actual job. If the job executed successfully, you should see the job status change to running right after hitting the execute endpoint.

Checking the Status of an Offline Job

Call the Get Status endpoint using the job_idprovided from the initialize endpoint. If the job_idwas untracked, you can call the get status endpoint with no arguments to view all of your jobs.

Reading the Output Files

  1. If the status of your job is complete, you can get new AWS credentials by calling the refresh credentials endpoint.
    1. Note: These jobs typically take around 45 minutes
  2. Using these credentials, you can view the results:
export AWS_ACCESS_KEY_ID="<accessKeyId>" && \
export AWS_SECRET_ACCESS_KEY="<secretAccessKey>" && \
export AWS_SESSION_TOKEN="<sessionToken>" && \
aws s3 cp <output dir> ./output/ --recursive
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("ReadOfflineJobOutput")
  .getOrCreate()

// Temporary credentials from API response
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "<accessKeyId>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "<secretAccessKey>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", "<sessionToken>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")

val outputUri = "s3a://4sq-offline-jobs/output/abcd1234/"

// Read the output as a DataFrame
val resultDf = spark.read.parquet(outputUri)

// Example: show or save locally
resultDf.show(10)
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("ReadOfflineJobOutput").getOrCreate()

hadoop_conf = spark._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.access.key", "<accessKeyId>")
hadoop_conf.set("fs.s3a.secret.key", "<secretAccessKey>")
hadoop_conf.set("fs.s3a.session.token", "<sessionToken>")
hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")

output_uri = "s3a://4sq-offline-jobs/output/abcd1234/"

df = spark.read.parquet(output_uri)
df.show(10)

Available Offline Jobs