Factual offers an easy pathway to integration via AWS S3. This document outlines the data format that Factual expects, and the process for submitting this data.
Transfer
Bucket & Permissioning
Factual’s integration requires hosting a bucket on AWS S3 and uploading the data to that bucket. This bucket will need permissions allowing Factual access to read the data. The following is an example bucket policy that can be applied:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowFactualFileOperations",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::315898705177:root"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::$BUCKET/*"
},
{
"Sid": "AllowFactualListOperations",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::315898705177:root"
},
"Action": "s3:*",
"Resource": "arn:aws:s3:::$BUCKET"
}
]
}
File Format
Data should adhere to one of the formats described in our Accepted Formats document.
Data must be submitted as gzipped text, and input file names must end in “.gz”.
Schedule and File Path
Data may be submitted either hourly or daily. If hourly, the files should be uploaded to the following path:
s3://$BUCKET/$YYYY/$MM/$DD/$HH/$FILE_1
s3://$BUCKET/$YYYY/$MM/$DD/$HH/$FILE_2
If daily, the files should be uploaded to the following path:
s3://$BUCKET/$YYYY/$MM/$DD/$FILE_1
s3://$BUCKET/$YYYY/$MM/$DD/$FILE_2
In either case, it is necessary to write an empty file named _SUCCESS next to the data files, to indicate that all files have been uploaded to that path. The following is an example expected S3 directory structure:
s3://example-intake-bucket/2015/10/07/19/part-00000.gz
s3://example-intake-bucket/2015/10/07/19/part-00001.gz
s3://example-intake-bucket/2015/10/07/19/part-00002.gz
s3://example-intake-bucket/2015/10/07/19/part-00003.gz
s3://example-intake-bucket/2015/10/07/19/part-00004.gz
s3://example-intake-bucket/2015/10/07/19/_SUCCESS
s3://example-intake-bucket/2015/10/07/20/part-00000.gz
s3://example-intake-bucket/2015/10/07/20/part-00001.gz
s3://example-intake-bucket/2015/10/07/20/part-00002.gz
s3://example-intake-bucket/2015/10/07/20/part-00003.gz
s3://example-intake-bucket/2015/10/07/20/part-00004.gz
s3://example-intake-bucket/2015/10/07/20/_SUCCESS
Please note that individual data files should not exceed 1 GB in size.