CSV / TSV
The Studio platform can import CSV (Comma-Separated Values) and TSV (Tab-Separated Values) files, and can export CSV files.
Loader | Characteristic |
---|---|
File Extensions | '.csv' , '.tsv' , '.dsv' |
MIME types | 'text/csv' , 'text/tab-separated-values' , 'text/dsv' |
File Format | [CSV]https://en.wikipedia.org/wiki/comma-separated_values, [TSV]https://en.wikipedia.org/wiki/tab-separated_values |
Format Notes
For consistent results, CSV / TSV files should contain:
- a header row (header row auto detection is performed but may not be correct in all cases).
- one or more columns, where each row represents one geospatial feature (there should be some geospatial information).
- all the values in each column should be of the same data type.
Studio will auto detect column types:
- if geospatial columns are found, layers and filters will automatically be created.
- tables without geospatial columns can also be opened and used for analytics, including creation of custom columns, aggregation and joins with other datasets
Example:
id,point_latitude,point_longitude,value,start_time
a,31.2384,-127.30948,5,2019-08-01 12:00
b,31.2311,-127.30231,11,2019-08-01 12:05
c,31.2334,-127.30238,9,2019-08-01 11:55
CSV Column Type Detection
Because CSV files do not contain a schema (a specification of the type of data in each column), the Studio Platform will attempt to detect column data types by parsing a sample of data in each column, using the following rules:
Type | Column values that will trigger type deduction |
---|---|
boolean | True , False |
date | 2019-01-01 |
integer | 1 , 2 , 3 |
real | -74.158 , 40.832 |
string | hello , world |
timestamp | 2018-09-01 00:00 , 1570306147 , 1570306147000 |
geometry | Geometries (e.g. Polygons, Lines, Points) can be embedded into CSV as WKT stringsPOLYGON ((-74.158 40.835, -74.148 40.830, -74.151 40.832, -74.158 40.835)) or GeoJson geometries{"type":"Polygon","coordinates":[[[-74.158,40.835],[-74.157,40.839],[-74.148,40.830],[-74.150,40.833],[-74.151,40.832],[-74.158,40.835]]]} |
struct | Must be formatted as JSON objects in double quotes. For example, {"category1": 1, "category2": 2} must be formatted as "{""category1"": 1, ""category2"": 2}" . |
Note: Make sure to clean up values such as N/A
, Null
, \N
. If your column contains mixed type, Studio will treat it as string
to be safe.
Geometry in CSV files
CSV files with geometry columns in the following formats will be recognized by Studio's tools
- WKT (Well-Known Text)
- GeoJSON geometry
WKT example
id,geometry
1,"POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5))"
GeoJSON geometry example:
id,geometry
1,"{""type"":""Polygon"",""coordinates"":[[[-74.158491,40.835947],[-74.157914,40.83902]]]}"
A CSV GeoJSON column should contain only the geometry part of a GeoJSON Feature, which includes
type
andcoordinates
. It must be a JSON formatted string, with quotes ("
). CSV needs to be correctly quoted:
- double quotes are interpreted as quotes: "abc" => abc
- two double quotes inside a quoted string will be interpreted as a single quote. """a""" => "a"
Updated about 1 year ago