Place Match allows users to match a large amount of places to Foursquare’s POI dataset, returning standardized Place IDs, enriched attributes, and confidence-based match results.
Input
Supported File Types
- CSV
- parquet
For best results, all null values should be sanitized to empty strings.
Input File Format
root
|-- query_id: string (nullable = false)
|-- name: string (nullable = false)
|-- address: string (nullable = false)
|-- address2: string (nullable = true)
|-- city: string (nullable = false)
|-- state: string (nullable = false)
|-- country: string (nullable = false)
|-- latitude: double (nullable = true)
|-- longitude: double (nullable = true)
|-- zip: string (nullable = true)
Details
| Field Name | Description | Nullable? |
|---|---|---|
| query_id | Unique identifier for records in the input query data. | false |
| name | The name for records in the input query data. | false |
| address | The address for records in the input query data. | false |
| address2 | The 2nd line for records in the input query data. | true |
| city | The city for records in the input query data. | false |
| state | The state for records in the input query data. | false |
| country | The country for records in the input query data. | false |
| latitude | The latitude for records in the input query data. | true |
| longitude | The longitude for records in the input query data. | true |
| zip | The zip for records in the input query data. | true |
Valid Countries
You must specify the country codes of places you'd like matched when executing a Batch Place Match job. Valid countries include:
| Country | Country Code |
|---|---|
| United Arab Emirates | AE |
| Argentina | AR |
| Austria | AT |
| Australia | AU |
| Belgium | BE |
| Brazil | BR |
| Canada | CA |
| Switzerland | CH |
| Chile | CL |
| China | CN |
| Colombia | CO |
| Czech Republic | CZ |
| Germany | DE |
| Denmark | DK |
| Spain | ES |
| Finland | FI |
| France | FR |
| United Kingdom | GB |
| Croatia | HR |
| Hungary | HU |
| Indonesia | ID |
| Ireland | IE |
| Israel | IL |
| India | IN |
| Italy | IT |
| Japan | JP |
| South Korea | KR |
| Mexico | MX |
| Malaysia | MY |
| Netherlands | NL |
| Norway | NO |
| New Zealand | NZ |
| Philippines | PH |
| Poland | PL |
| Portugal | PT |
| Russia | RU |
| Sweden | SE |
| Singapore | SG |
| Slovakia | SK |
| Thailand | TH |
| Taiwan | TW |
| United States | US |
| South Africa | ZA |
Output
Output will always be generated as parquet.
Output File Format
root
|-- target_id: string (nullable = true)
|-- query_id: string (nullable = true)
|-- sim_score: float (nullable = true)
|-- sim_rank: integer (nullable = true)
|-- query_name: string (nullable = true)
|-- query_address: string (nullable = true)
|-- query_address2: string (nullable = true)
|-- query_city: string (nullable = true)
|-- query_state: string (nullable = true)
|-- query_zip: string (nullable = true)
|-- query_latitude: double (nullable = true)
|-- query_longitude: double (nullable = true)
|-- target_name: string (nullable = true)
|-- target_address: string (nullable = true)
|-- target_address2: string (nullable = true)
|-- target_city: string (nullable = true)
|-- target_state: string (nullable = true)
|-- target_zip: string (nullable = true)
|-- target_latitude: double (nullable = true)
|-- target_longitude: double (nullable = true)
|-- bucket: string (nullable = true)
|-- country: string (nullable = true)
Details
| Field Name | Description | Nullable? |
|---|---|---|
| target_id | Unique identifier for the potential matched record in the target FSQ data. | true |
| query_id | Unique identifier for records in the input query data. | true |
| sim_score | Similarity score values capturing how likely the corresponding pair of query and target places are to be a match, i.e. refer to the same place in the real world. Scores are between 0 and 1, with 1 representing the two places being the most similar and 0 representing the most dissimilar. Can be used for more granular analysis or bucketing upon need. | true |
| sim_rank | Ranking corresponding to the similarity score for the pair of query and target places in each row. This will, by default, be 1 across all output rows since the pipeline returns only the top-ranked (i.e. highest sim_score) pair for every query. | true |
| query_name | Place name in the input data. | true |
| query_address | Primary address line for the input query (e.g., street number and name). | true |
| query_address2 | Secondary address line for the input query (e.g., apartment number, suite). | true |
| query_city | City for the input query. | true |
| query_state | State for the input query. | true |
| query_zip | Zip/Postal code for the input query. | true |
| query_latitude | Latitude coordinate for the input query's location. | true |
| query_longitude | Longitude coordinate for the input query's location. | true |
| target_name | Place name in the target FSQ data. | true |
| target_address | Primary address line for the place in the target FSQ data (e.g., street number and name). | true |
| target_address2 | Secondary address line for the place in the target FSQ data (e.g., apartment number, suite). | true |
| target_city | City for the place in the target FSQ data. | true |
| target_state | State for the place in the target FSQ data. | true |
| target_zip | Zip/Postal code for the place in the target FSQ data. | true |
| target_latitude | Latitude coordinate for the place in the target FSQ data. | true |
| target_longitude | Longitude coordinate for the place in the target FSQ data. | true |
| bucket | Match bucket for the pair of query and target places, capturing our confidence in matching the input query:
| true |
| country | Country where the query input data to be matched is located. | true |
