Geographical applications are everywhere: GPS and sat nav systems, maps, get-a-taxi apps, real estate portals, etc. Behind each of them is a spatial database storing geographical data, and supporting spatial queries. In this article, we will introduce PostGIS, the main open-source spatial database manager.

PostGIS is a spatial database extension for the PostgreSQL relational database. It adds support for geographic objects, allowing location queries to be run in SQL.

PostGIS adds two main data types to PostgreSQL: geography and geometry. Both allow the storage of points in a table, as well as other more complex shapes like lines (a line is defined by two points), multipoint lines (defined by N points), polygons (defined by a closed multipoint line), and points with a specific altitude (defined by a third coordinate). This extender also offers a set of spatial functions for distance calculation, area calculation, intersection, and inclusion, among many others. All these new data types and functions can be used in combination with regular relational data in SQL, increasing the power of queries.

In this article, we will cover the first steps of using PostGIS by exploring features like the geography data type and some spatial functions such as distance, area and intersection. To illustrate as we go, we'll include some SQL examples.

Let's begin by explaining the basic building block in a spatial database: a pair of coordinates representing a single point on the earth's surface.

### The *GEOGRAPHY* Data Type

The first feature of PostGIS we'll learn is the geography data type. We'll start by using it to represent a point defined by two coordinates: latitude and longitude. As our example in this article, we will use the table

, which stores a list of the world's most important works of art and where they are being displayed. This table will have the following schema:**artwork**

CREATE TABLE artwork ( artwork_name TEXT, category VARCHAR(20), artist_name VARCHAR (50), showed_at_latitude FLOAT, showed_at_longitude FLOAT, where_is GEOGRAPHY );

First, we will insert some relational data for a few art masterworks. Note that the latitude and longitude fields are still float numbers and are being stored as regular relational data, not spatial.

INSERT INTO artwork VALUES ('Giaconda','painting','Leonardo Da Vinci', 48.860547, 2.338513,NULL); INSERT INTO artwork VALUES ('David','sculpture','Michelangelo Buonarroti', 43.776709, 11.258887,NULL); INSERT INTO artwork VALUES ('Sunflowers','painting','Vincent Van Gogh', 48.149966, 11.570856,NULL); INSERT INTO artwork VALUES ('Guernica',' painting','Pablo Picasso', 40.407561, -3.694042,NULL);

In order to populate the `where_is`

column, we need to call the PostGIS **ST_POINT** function. This returns a *geography* data point, as we see in the following SQL statement:

UPDATE artwork SET where_is = ST_POINT(showed_at_latitude, showed_at_longitude);

After this update, we have our first *geography* data points in the database. If we try to see its values with a SELECT, we will see only hexadecimal values:

SELECT * FROM artwork;

"Giaconda"; "painting"; "Leonardo Da Vinci"; 48.860547; 2.338513; "0101000020E6100000E0D57267266E4840B22AC24D46B50240" "David"; "sculpture"; "Michelangelo Buonarroti"; 43.776709; 11.258887; "0101000020E610000022C154336BE3454017B83CD68C842640" "Sunflowers"; "painting"; "Vincent Van Gogh"; 48.149966; 11.570856; "0101000020E610000087C1FC15321348406473D53C47242740" "Guernica"; "painting"; "Pablo Picasso"; 40.407561; -3.694042; "0101000020E6100000021077F52A344440496760E4658D0DC0"

We have two items to clarify here. The first is that when we used the *geography* data type, we omitted to mention the category. PostGIS assumed we meant the default category, which is POINT. However – and as we've already mentioned – we can store many different categories using the *geography* data type.

The second item relates to the values in the `where_is`

column. These values are stored internally in PostGIS using the WKB (Well Known Binary) format, and we can't readily understand what they represent in this format. We need some software to interpret their meaning; for example, if we use QGIS we can show the four geography points on a map. Have a look at the results in the following screenshot:

### Well-Known Text (WKT)

The previous paragraph opens the door to a central concept in the spatial database standard. Every spatial element (like a point, line or polygon) can be expressed or defined using two different formats:

- Well Known Binary (WKB)
- Well Known Text (WKT)

Let's take a minute here to talk about the Well-Known Text format. As we saw in the previous example, every geography element in PostGIS is represented internally as a string of hexadecimal digits in WKB format. Obviously, this is really difficult for humans to understand. The WKT is a clearer, human-friendly way to represent any *geometry/geography* element. Some examples of WKTs are:

'POINT (123 456) ' 'LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932) ' 'POLYGON((-3 40,-3 41,4 41,4 40,-3 40))'

Moreover, a function called ** ST_AsText()** can obtain the WKT of any

*geometry*or

*geography*spatial element, as we can see in the following example:

SELECT ST_AsText('01030000000100000005000000000000000000000000000000000000000000000000000000000000000000F03F000000000000F03F000000000000F03F000000000000F03F000000000000000000000000000000000000000000000000'); st_astext "POLYGON((0 0,0 1,1 1,1 0,0 0))"

There is also a function called

. This is the inverse function; given a WKT format, it returns the associated geography.
**ST_GeoFromText()**

### The Simplest Spatial Function: Distance

In this article, we're only covering geography data types that are based on latitude and longitude. However, PostGIS allows us to represent and store other spatial elements (such as a geographic point based on latitude, longitude, and altitude) and the *geometry* data type (which allows the representation of two or three coordinates' spatial points in a simple way).

The basis for the PostGIS *geometry* data type is a plane. The shortest path between two points on a plane is a straight line. That means calculations for geometries (areas, distances, lengths, intersections, etc.) can be done using Cartesian mathematics and straight line vectors.

On the other hand, the basis for the PostGIS *geography* data type is a sphere. The shortest path between two points on the sphere is a great circle arc. That means that calculations on geographies (areas, distances, lengths, intersections, etc.) must be based on a sphere, using more complicated mathematics. Moreover, the earth is not a perfect sphere, so calculations must consider the world as a spheroidal shape.... Too complicated! Let's continue with using PostGIS.

Let's go back to the **distance** function for geographical points. The following SQL query calculates the distance between our four masterworks, showing all possible combinations.

SELECT artwork1.artwork_name, artwork2.artwork_name, ST_DISTANCE(artwork1.where_is, artwork2.where_is) FROM artwork artwork1, artwork artwork2

"Giaconda" "Giaconda" 0 "Giaconda" "David" 1135068.18984272 "Giaconda" "Sunflowers" 1024039.70872467 "Giaconda" "Guernica" 1153026.40558596 "David" "Giaconda" 1135068.18984272 "David" "David" 0 "David" "Sunflowers" 478502.586379199 "David" "Guernica" 1695158.89068659 "Sunflowers" "Giaconda" 1024039.70872467 "Sunflowers" "David" 478502.586379199 "Sunflowers" "Sunflowers" 0 "Sunflowers" "Guernica" 1893294.8267 "Guernica" "Giaconda" 1153026.40558596 "Guernica" "David" 1695158.89068659 "Guernica" "Sunflowers" 1893294.8267 "Guernica" "Guernica" 0Artwork1 Artwork2 Distance (meters)

By definition, distance results in PostGIS are expressed as meters. Anyone who prefers to see results in feet should convert from meters to feet, using the following simple math expression:

SELECT artwork1.artwork_name, artwork2.artwork_name, ST_DISTANCE(artwork1.where_is, artwork2.where_is) * 3.2808399 FROM artwork artwork1, artwork artwork2

"Giaconda" "Giaconda" 0 "Giaconda" "David" 2915185.37043353 "Giaconda" "Sunflowers" 2251773.82097004Artwork1 Artwork2 Distance (meters)

### We Don't Just Store Points: Lines, Polygons, Multipoints

The *geography* data type has a modifier that specifies what kind of element we will store in the column. In our first example, we didn't specify the modifier, so by default we stored a POINT. However we can store LINES, POLYGONS, and even complex spatial forms like MULTIPOINTS (an array of points). For example, if we want to store a polygon to represent the perimeter of a museum, we can create the following table:

CREATE TABLE museum ( museum_name VARCHAR(20), country VARCHAR(20), perimeter GEOGRAPHY(POLYGON) );

To create a polygon value, we will use the PostGIS function ** ST_GeoFromText**. As we've already mentioned, this returns a geometry value when it is given a WKT representation. The WKT for a polygon has the following form:

'POLYGON((-3 40,-3 41,4 41,4 40,-3 40))'

Note that the first point and the last point are the same; to define a polygon, it is mandatory to use a closed multiline. The next SQL inserts define museum records with its perimeters:

INSERT INTO museum VALUES ('Munich Pinakotek','Deuschtland',ST_GeogFromText('POLYGON((11 48,11 49,12 49,12 48,11 48))')); INSERT INTO museum VALUES ('Accademia Gallery','Italy',ST_GeogFromText('POLYGON((11 43,11 44,12 44,12 43,11 43))')); INSERT INTO museum VALUES ('Musee du Louvre','France',ST_GeogFromText('POLYGON((2 48,2 49,3 49,3 48,2 48))')); INSERT INTO museum VALUES ('Museo Reina Sofia','Spain',ST_GeogFromText('POLYGON((-4 40,-4 41,-3 41,-3 40,-4 40))'));

*Note: The perimeters are not strictly real, but they include the museum area.*

### Calculating the Area of Complex Shapes

Now that we have a table with a polygon defining the perimeter of each museum, we can use the **ST_AREA** function to obtain the area of every museum. We'll do this by using a really simple query.

The interesting point here is the simplicity. Think how complex it could be to calculate the area of an irregular shape. Here we are doing it with a nice short query:

SELECT museum_name, st_area(perimeter) FROM museum ORDER BY 2 DESC

"Museo Reina Sofia" 9412789265.14431 "Accademia Gallery" 8985388059.43758 "Munich Pinakothek" 8217590765.89231 "Musee du Louvre" 8217590765.89229museum_name st_area (in square meters)

*Note: Museum perimeters are estimations, so area values are approximated.*

### Getting to Grips With Join Conditions

If we review the schema of the museum and artwork tables, there are no foreign keys that allow a join. How we can relate a piece of art with the museum where it is being displayed? It there any spatial condition to join both tables ? Let's take a look to the following SQL join:

SELECT artwork_name, museum_name FROM museum, artwork WHERE ST_INTERSECTS(where_is, perimeter)

"Sunflowers" "Munich Pinakothek" "David" "Accademia Gallery" "Giaconda" "Musee du Louvre" "Guernica" "Museo Reina Sofia"Artwork_name Museum_name

### Try It Yourself: an Exercise for the Reader

Spatial databases do have more complex elements, functions and algorithms, but the basic concepts we've discussed are present behind every spatial element. One element not covered here is the third dimension of a point. Remember, spatial databases also allow us to represent the altitude of a given point. Let's imagine a museum where two artworks are at the same latitudinal and longitudinal coordinates but on different floors.