Back to articles list June 23, 2016 - 8 minutes read Your First Steps With the Geography Data Type Maria Alcaraz Former Freelance Database Developer, Mother of 4 children Tags: PostgreSQL Geographical applications are everywhere: GPS and sat nav systems, maps, get-a-taxi apps, real estate portals, etc. Behind each of them is a spatial database storing geographical data, and supporting spatial queries. In this article, we will introduce PostGIS, the main open-source spatial database manager. PostGIS is a spatial database extension for the PostgreSQL relational database. It adds support for geographic objects, allowing location queries to be run in SQL. PostGIS adds two main data types to PostgreSQL: geography and geometry. Both allow the storage of points in a table, as well as other more complex shapes like lines (a line is defined by two points), multipoint lines (defined by N points), polygons (defined by a closed multipoint line), and points with a specific altitude (defined by a third coordinate). This extender also offers a set of spatial functions for distance calculation, area calculation, intersection, and inclusion, among many others. All these new data types and functions can be used in combination with regular relational data in SQL, increasing the power of queries. In this article, we will cover the first steps of using PostGIS by exploring features like the geography data type and some spatial functions such as distance, area and intersection. To illustrate as we go, we'll include some SQL examples. Let's begin by explaining the basic building block in a spatial database: a pair of coordinates representing a single point on the earth's surface. The GEOGRAPHY Data Type The first feature of PostGIS we'll learn is the geography data type. We'll start by using it to represent a point defined by two coordinates: latitude and longitude. As our example in this article, we will use the table artwork, which stores a list of the world's most important works of art and where they are being displayed. This table will have the following schema: CREATE TABLE artwork ( artwork_name TEXT, category VARCHAR(20), artist_name VARCHAR (50), showed_at_latitude FLOAT, showed_at_longitude FLOAT, where_is GEOGRAPHY ); First, we will insert some relational data for a few art masterworks. Note that the latitude and longitude fields are still float numbers and are being stored as regular relational data, not spatial. INSERT INTO artwork VALUES ('Giaconda','painting','Leonardo Da Vinci', 48.860547, 2.338513,NULL); INSERT INTO artwork VALUES ('David','sculpture','Michelangelo Buonarroti', 43.776709, 11.258887,NULL); INSERT INTO artwork VALUES ('Sunflowers','painting','Vincent Van Gogh', 48.149966, 11.570856,NULL); INSERT INTO artwork VALUES ('Guernica',' painting','Pablo Picasso', 40.407561, -3.694042,NULL); In order to populate the where_is column, we need to call the PostGIS ST_POINT function. This returns a geography data point, as we see in the following SQL statement: UPDATE artwork SET where_is = ST_POINT(showed_at_latitude, showed_at_longitude); After this update, we have our first geography data points in the database. If we try to see its values with a SELECT, we will see only hexadecimal values: SELECT * FROM artwork; "Giaconda"; "painting"; "Leonardo Da Vinci"; 48.860547; 2.338513; "0101000020E6100000E0D57267266E4840B22AC24D46B50240" "David"; "sculpture"; "Michelangelo Buonarroti"; 43.776709; 11.258887; "0101000020E610000022C154336BE3454017B83CD68C842640" "Sunflowers"; "painting"; "Vincent Van Gogh"; 48.149966; 11.570856; "0101000020E610000087C1FC15321348406473D53C47242740" "Guernica"; "painting"; "Pablo Picasso"; 40.407561; -3.694042; "0101000020E6100000021077F52A344440496760E4658D0DC0" We have two items to clarify here. The first is that when we used the geography data type, we omitted to mention the category. PostGIS assumed we meant the default category, which is POINT. However – and as we've already mentioned – we can store many different categories using the geography data type. The second item relates to the values in the where_is column. These values are stored internally in PostGIS using the WKB (Well Known Binary) format, and we can't readily understand what they represent in this format. We need some software to interpret their meaning; for example, if we use QGIS we can show the four geography points on a map. Have a look at the results in the following screenshot: Well-Known Text (WKT) The previous paragraph opens the door to a central concept in the spatial database standard. Every spatial element (like a point, line or polygon) can be expressed or defined using two different formats: Well Known Binary (WKB) Well Known Text (WKT) Let's take a minute here to talk about the Well-Known Text format. As we saw in the previous example, every geography element in PostGIS is represented internally as a string of hexadecimal digits in WKB format. Obviously, this is really difficult for humans to understand. The WKT is a clearer, human-friendly way to represent any geometry/geography element. Some examples of WKTs are: 'POINT (123 456) ' 'LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932) ' 'POLYGON((-3 40,-3 41,4 41,4 40,-3 40))' Moreover, a function called ST_AsText() can obtain the WKT of any geometry or geography spatial element, as we can see in the following example: SELECT ST_AsText('01030000000100000005000000000000000000000000000000000000000000000000000000000000000000F03F000000000000F03F000000000000F03F000000000000F03F000000000000000000000000000000000000000000000000'); st_astext "POLYGON((0 0,0 1,1 1,1 0,0 0))" There is also a function called ST_GeoFromText(). This is the inverse function; given a WKT format, it returns the associated geography. The Simplest Spatial Function: Distance In this article, we're only covering geography data types that are based on latitude and longitude. However, PostGIS allows us to represent and store other spatial elements (such as a geographic point based on latitude, longitude, and altitude) and the geometry data type (which allows the representation of two or three coordinates' spatial points in a simple way). The basis for the PostGIS geometry data type is a plane. The shortest path between two points on a plane is a straight line. That means calculations for geometries (areas, distances, lengths, intersections, etc.) can be done using Cartesian mathematics and straight line vectors. On the other hand, the basis for the PostGIS geography data type is a sphere. The shortest path between two points on the sphere is a great circle arc. That means that calculations on geographies (areas, distances, lengths, intersections, etc.) must be based on a sphere, using more complicated mathematics. Moreover, the earth is not a perfect sphere, so calculations must consider the world as a spheroidal shape.... Too complicated! Let's continue with using PostGIS. Let's go back to the distance function for geographical points. The following SQL query calculates the distance between our four masterworks, showing all possible combinations. SELECT artwork1.artwork_name, artwork2.artwork_name, ST_DISTANCE(artwork1.where_is, artwork2.where_is) FROM artwork artwork1, artwork artwork2 Artwork1 Artwork2 Distance (meters) "Giaconda" "Giaconda" 0 "Giaconda" "David" 1135068.18984272 "Giaconda" "Sunflowers" 1024039.70872467 "Giaconda" "Guernica" 1153026.40558596 "David" "Giaconda" 1135068.18984272 "David" "David" 0 "David" "Sunflowers" 478502.586379199 "David" "Guernica" 1695158.89068659 "Sunflowers" "Giaconda" 1024039.70872467 "Sunflowers" "David" 478502.586379199 "Sunflowers" "Sunflowers" 0 "Sunflowers" "Guernica" 1893294.8267 "Guernica" "Giaconda" 1153026.40558596 "Guernica" "David" 1695158.89068659 "Guernica" "Sunflowers" 1893294.8267 "Guernica" "Guernica" 0 By definition, distance results in PostGIS are expressed as meters. Anyone who prefers to see results in feet should convert from meters to feet, using the following simple math expression: SELECT artwork1.artwork_name, artwork2.artwork_name, ST_DISTANCE(artwork1.where_is, artwork2.where_is) * 3.2808399 FROM artwork artwork1, artwork artwork2 Artwork1 Artwork2 Distance (meters) "Giaconda" "Giaconda" 0 "Giaconda" "David" 2915185.37043353 "Giaconda" "Sunflowers" 2251773.82097004 We Don't Just Store Points: Lines, Polygons, Multipoints The geography data type has a modifier that specifies what kind of element we will store in the column. In our first example, we didn't specify the modifier, so by default we stored a POINT. However we can store LINES, POLYGONS, and even complex spatial forms like MULTIPOINTS (an array of points). For example, if we want to store a polygon to represent the perimeter of a museum, we can create the following table: CREATE TABLE museum ( museum_name VARCHAR(20), country VARCHAR(20), perimeter GEOGRAPHY(POLYGON) ); To create a polygon value, we will use the PostGIS function ST_GeoFromText. As we've already mentioned, this returns a geometry value when it is given a WKT representation. The WKT for a polygon has the following form: 'POLYGON((-3 40,-3 41,4 41,4 40,-3 40))' Note that the first point and the last point are the same; to define a polygon, it is mandatory to use a closed multiline. The next SQL inserts define museum records with its perimeters: INSERT INTO museum VALUES ('Munich Pinakotek','Deuschtland',ST_GeogFromText('POLYGON((11 48,11 49,12 49,12 48,11 48))')); INSERT INTO museum VALUES ('Accademia Gallery','Italy',ST_GeogFromText('POLYGON((11 43,11 44,12 44,12 43,11 43))')); INSERT INTO museum VALUES ('Musee du Louvre','France',ST_GeogFromText('POLYGON((2 48,2 49,3 49,3 48,2 48))')); INSERT INTO museum VALUES ('Museo Reina Sofia','Spain',ST_GeogFromText('POLYGON((-4 40,-4 41,-3 41,-3 40,-4 40))')); Note: The perimeters are not strictly real, but they include the museum area. Calculating the Area of Complex Shapes Now that we have a table with a polygon defining the perimeter of each museum, we can use the ST_AREA function to obtain the area of every museum. We'll do this by using a really simple query. The interesting point here is the simplicity. Think how complex it could be to calculate the area of an irregular shape. Here we are doing it with a nice short query: SELECT museum_name, st_area(perimeter) FROM museum ORDER BY 2 DESC museum_name st_area (in square meters) "Museo Reina Sofia" 9412789265.14431 "Accademia Gallery" 8985388059.43758 "Munich Pinakothek" 8217590765.89231 "Musee du Louvre" 8217590765.89229 Note: Museum perimeters are estimations, so area values are approximated. Getting to Grips With Join Conditions If we review the schema of the museum and artwork tables, there are no foreign keys that allow a join. How we can relate a piece of art with the museum where it is being displayed? It there any spatial condition to join both tables ? Let's take a look to the following SQL join: SELECT artwork_name, museum_name FROM museum, artwork WHERE ST_INTERSECTS(where_is, perimeter) Artwork_name Museum_name "Sunflowers" "Munich Pinakothek" "David" "Accademia Gallery" "Giaconda" "Musee du Louvre" "Guernica" "Museo Reina Sofia" Try It Yourself: an Exercise for the Reader Spatial databases do have more complex elements, functions and algorithms, but the basic concepts we've discussed are present behind every spatial element. One element not covered here is the third dimension of a point. Remember, spatial databases also allow us to represent the altitude of a given point. Let's imagine a museum where two artworks are at the same latitudinal and longitudinal coordinates but on different floors. Tags: PostgreSQL You may also like Pivot Tables in PostgreSQL Using the Crosstab Function Some years ago, when PostgreSQL version 8.3 was released, a new extension called tablefunc was introduced. This extension provides a really interesting set of functions. One of them is the crosstab function, which is used for pivot table creation. That's what we'll cover in this article. Read more SQL Course of the Month – PostGIS PostGIS enables efficient processing of large geographic data sets. Find out why you should take our interactive course. Read more Why Use SQL Over Excel If you're using Excel as a data analyst, you're missing something valuable. In this article, I explain why the use of SQL over Excel is the right choice. Read more SQL Technologies Are More Common Than You Think Discover the impressive - and still growing - list of technologies that support SQL! Read more SQL Window Functions vs. SQL Aggregate Functions: Similarities and Differences What’s the difference between SQL window functions vs. SQL aggregate functions? We summarize their similarities and differences and explain when to use each one. Read more How to Find All Employees Under Each Manager in SQL Learn to use SQL’s recursive queries and find all employees under each manager in a company. Read more Top 7 Online Courses for Data Engineers These 7 online courses for data engineers are designed to set you up for one of the most coveted career paths in today’s world. Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.