Place Match

Place Match allows users to match a large amount of places to Foursquare’s POI dataset, returning standardized Place IDs, enriched attributes, and confidence-based match results.

Input

Supported File Types

  • CSV
  • parquet

For best results, all null values should be sanitized to empty strings.

Input File Format

root
 |-- query_id: string (nullable = false)
 |-- name: string (nullable = false)
 |-- address: string (nullable = false)
 |-- address2: string (nullable = true)
 |-- city: string (nullable = false)
 |-- state: string (nullable = false)
 |-- country: string (nullable = false)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)
 |-- zip: string (nullable = true)

Details

Field NameDescriptionNullable?
query_idUnique identifier for records in the input query data.false
nameThe name for records in the input query data.false
addressThe address for records in the input query data.false
address2The 2nd line for records in the input query data.true
cityThe city for records in the input query data.false
stateThe state for records in the input query data.false
countryThe country for records in the input query data.false
latitudeThe latitude for records in the input query data.true
longitudeThe longitude for records in the input query data.true
zipThe zip for records in the input query data.true

Valid Countries

You must specify the country codes of places you'd like matched when executing a Batch Place Match job. Valid countries include:

Country Country Code
United Arab EmiratesAE
ArgentinaAR
AustriaAT
AustraliaAU
BelgiumBE
BrazilBR
CanadaCA
SwitzerlandCH
ChileCL
ChinaCN
ColombiaCO
Czech RepublicCZ
GermanyDE
DenmarkDK
SpainES
FinlandFI
FranceFR
United KingdomGB
CroatiaHR
HungaryHU
IndonesiaID
IrelandIE
IsraelIL
IndiaIN
ItalyIT
JapanJP
South KoreaKR
MexicoMX
MalaysiaMY
NetherlandsNL
NorwayNO
New ZealandNZ
PhilippinesPH
PolandPL
PortugalPT
RussiaRU
SwedenSE
SingaporeSG
SlovakiaSK
ThailandTH
TaiwanTW
United StatesUS
South AfricaZA

Output

Output will always be generated as parquet.

Output File Format

root
 |-- target_id: string (nullable = true)
 |-- query_id: string (nullable = true)
 |-- sim_score: float (nullable = true)
 |-- sim_rank: integer (nullable = true)
 |-- query_name: string (nullable = true)
 |-- query_address: string (nullable = true)
 |-- query_address2: string (nullable = true)
 |-- query_city: string (nullable = true)
 |-- query_state: string (nullable = true)
 |-- query_zip: string (nullable = true)
 |-- query_latitude: double (nullable = true)
 |-- query_longitude: double (nullable = true)
 |-- target_name: string (nullable = true)
 |-- target_address: string (nullable = true)
 |-- target_address2: string (nullable = true)
 |-- target_city: string (nullable = true)
 |-- target_state: string (nullable = true)
 |-- target_zip: string (nullable = true)
 |-- target_latitude: double (nullable = true)
 |-- target_longitude: double (nullable = true)
 |-- bucket: string (nullable = true)
 |-- country: string (nullable = true)

Details

Field NameDescriptionNullable?
target_idUnique identifier for the potential matched record in the target FSQ data.true
query_idUnique identifier for records in the input query data.true
sim_scoreSimilarity score values capturing how likely the corresponding pair of query and target places are to be a match, i.e. refer to the same place in the real world. Scores are between 0 and 1, with 1 representing the two places being the most similar and 0 representing the most dissimilar. Can be used for more granular analysis or bucketing upon need.true
sim_rankRanking corresponding to the similarity score for the pair of query and target places in each row. This will, by default, be 1 across all output rows since the pipeline returns only the top-ranked (i.e. highest sim_score) pair for every query.true
query_namePlace name in the input data.true
query_addressPrimary address line for the input query (e.g., street number and name).true
query_address2Secondary address line for the input query (e.g., apartment number, suite).true
query_cityCity for the input query.true
query_stateState for the input query.true
query_zipZip/Postal code for the input query.true
query_latitudeLatitude coordinate for the input query's location.true
query_longitudeLongitude coordinate for the input query's location.true
target_namePlace name in the target FSQ data.true
target_addressPrimary address line for the place in the target FSQ data (e.g., street number and name).true
target_address2Secondary address line for the place in the target FSQ data (e.g., apartment number, suite).true
target_cityCity for the place in the target FSQ data.true
target_stateState for the place in the target FSQ data.true
target_zipZip/Postal code for the place in the target FSQ data.true
target_latitudeLatitude coordinate for the place in the target FSQ data.true
target_longitudeLongitude coordinate for the place in the target FSQ data.true
bucketMatch bucket for the pair of query and target places, capturing our confidence in matching the input query:
  • definite match: The two places are matched with the highest confidence.
  • likely match: The two places have a great likelihood of being a match.
  • need more info: We need more info to reliably determine the match status.
  • not a match: We did not identify a match for the query.
true
countryCountry where the query input data to be matched is located.true