Back to articles list Articles Cookbook
Updated: 23rd Jun 2016 8 minutes read

Your First Steps With the Geography Data Type

Geographical applications are everywhere: GPS and sat nav systems, maps, get-a-taxi apps, real estate portals, etc. Behind each of them is a spatial database storing geographical data, and supporting spatial queries. In this article, we will introduce PostGIS, the main open-source spatial database manager.

PostGIS is a spatial database extension for the PostgreSQL relational database. It adds support for geographic objects, allowing location queries to be run in SQL.

PostGIS adds two main data types to PostgreSQL: geography and geometry. Both allow the storage of points in a table, as well as other more complex shapes like lines (a line is defined by two points), multipoint lines (defined by N points), polygons (defined by a closed multipoint line), and points with a specific altitude (defined by a third coordinate). This extender also offers a set of spatial functions for distance calculation, area calculation, intersection, and inclusion, among many others. All these new data types and functions can be used in combination with regular relational data in SQL, increasing the power of queries.

If you like hands-on learning, check out our interactive PostGIS course. It contains over 140 hands-on exercises to learn and practice geographical queries in PostGIS.

In this article, we will cover the first steps of using PostGIS by exploring features like the geography data type and some spatial functions such as distance, area and intersection. To illustrate as we go, we'll include some SQL examples.

Let's begin by explaining the basic building block in a spatial database: a pair of coordinates representing a single point on the earth's surface.

The GEOGRAPHY Data Type

The first feature of PostGIS we'll learn is the geography data type. We'll start by using it to represent a point defined by two coordinates: latitude and longitude. As our example in this article, we will use the table artwork, which stores a list of the world's most important works of art and where they are being displayed. This table will have the following schema:

artwork table

CREATE TABLE artwork (
artwork_name 			TEXT,
category 			VARCHAR(20),
artist_name 			VARCHAR (50),
showed_at_latitude 		FLOAT,
showed_at_longitude		FLOAT,
where_is			GEOGRAPHY );

First, we will insert some relational data for a few art masterworks. Note that the latitude and longitude fields are still float numbers and are being stored as regular relational data, not spatial.

INSERT INTO artwork VALUES ('Giaconda','painting','Leonardo Da Vinci', 48.860547, 2.338513,NULL);
INSERT INTO artwork VALUES ('David','sculpture','Michelangelo Buonarroti', 43.776709, 11.258887,NULL);
INSERT INTO artwork VALUES ('Sunflowers','painting','Vincent Van Gogh', 48.149966, 11.570856,NULL);
INSERT INTO artwork VALUES ('Guernica',' painting','Pablo Picasso', 40.407561, -3.694042,NULL);

In order to populate the where_is column, we need to call the PostGIS ST_POINT function. This returns a geography data point, as we see in the following SQL statement:

UPDATE artwork SET where_is = ST_POINT(showed_at_latitude, showed_at_longitude);

After this update, we have our first geography data points in the database. If we try to see its values with a SELECT, we will see only hexadecimal values:

SELECT * FROM artwork;

"Giaconda"; 	"painting"; 	"Leonardo Da Vinci"; 	48.860547; 2.338513; 	"0101000020E6100000E0D57267266E4840B22AC24D46B50240"
"David"; 	"sculpture"; 	"Michelangelo Buonarroti"; 	43.776709; 11.258887; 	"0101000020E610000022C154336BE3454017B83CD68C842640"
"Sunflowers"; 	"painting"; 	"Vincent Van Gogh"; 	48.149966; 11.570856; 	"0101000020E610000087C1FC15321348406473D53C47242740"
"Guernica"; 	"painting"; 	"Pablo Picasso"; 	40.407561; -3.694042; 	"0101000020E6100000021077F52A344440496760E4658D0DC0"

We have two items to clarify here. The first is that when we used the geography data type, we omitted to mention the category. PostGIS assumed we meant the default category, which is POINT. However – and as we've already mentioned – we can store many different categories using the geography data type.

The second item relates to the values in the where_is column. These values are stored internally in PostGIS using the WKB (Well Known Binary) format, and we can't readily understand what they represent in this format. We need some software to interpret their meaning; for example, if we use QGIS we can show the four geography points on a map. Have a look at the results in the following screenshot:

QGIS - Europe

Well-Known Text (WKT)

The previous paragraph opens the door to a central concept in the spatial database standard. Every spatial element (like a point, line or polygon) can be expressed or defined using two different formats:

  • Well Known Binary (WKB)
  • Well Known Text (WKT)

Let's take a minute here to talk about the Well-Known Text format. As we saw in the previous example, every geography element in PostGIS is represented internally as a string of hexadecimal digits in WKB format. Obviously, this is really difficult for humans to understand. The WKT is a clearer, human-friendly way to represent any geometry/geography element. Some examples of WKTs are:

'POINT (123 456) '
'LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932) '
'POLYGON((-3 40,-3 41,4 41,4 40,-3 40))'

Moreover, a function called ST_AsText() can obtain the WKT of any geometry or geography spatial element, as we can see in the following example:

SELECT ST_AsText('01030000000100000005000000000000000000000000000000000000000000000000000000000000000000F03F000000000000F03F000000000000F03F000000000000F03F000000000000000000000000000000000000000000000000');
st_astext
"POLYGON((0 0,0 1,1 1,1 0,0 0))"

There is also a function called ST_GeoFromText(). This is the inverse function; given a WKT format, it returns the associated geography.

The Simplest Spatial Function: Distance

In this article, we're only covering geography data types that are based on latitude and longitude. However, PostGIS allows us to represent and store other spatial elements (such as a geographic point based on latitude, longitude, and altitude) and the geometry data type (which allows the representation of two or three coordinates' spatial points in a simple way).

The basis for the PostGIS geometry data type is a plane. The shortest path between two points on a plane is a straight line. That means calculations for geometries (areas, distances, lengths, intersections, etc.) can be done using Cartesian mathematics and straight line vectors.

On the other hand, the basis for the PostGIS geography data type is a sphere. The shortest path between two points on the sphere is a great circle arc. That means that calculations on geographies (areas, distances, lengths, intersections, etc.) must be based on a sphere, using more complicated mathematics. Moreover, the earth is not a perfect sphere, so calculations must consider the world as a spheroidal shape.... Too complicated! Let's continue with using PostGIS.

Let's go back to the distance function for geographical points. The following SQL query calculates the distance between our four masterworks, showing all possible combinations.

SELECT 	artwork1.artwork_name,
		artwork2.artwork_name,
		ST_DISTANCE(artwork1.where_is, artwork2.where_is)
FROM 	artwork artwork1,
		artwork artwork2

Artwork1	Artwork2	Distance (meters)
"Giaconda"	"Giaconda"	0
"Giaconda"	"David"		1135068.18984272
"Giaconda"	"Sunflowers"	1024039.70872467
"Giaconda"	"Guernica"	1153026.40558596
"David"		"Giaconda"	1135068.18984272
"David"		"David"		0
"David"		"Sunflowers"	478502.586379199
"David"		"Guernica"	1695158.89068659
"Sunflowers"	"Giaconda"	1024039.70872467
"Sunflowers"	"David"		478502.586379199
"Sunflowers"	"Sunflowers"	0
"Sunflowers"	"Guernica"	1893294.8267
"Guernica"	"Giaconda"	1153026.40558596
"Guernica"	"David"		1695158.89068659
"Guernica"	"Sunflowers"	1893294.8267
"Guernica"	"Guernica"	0

By definition, distance results in PostGIS are expressed as meters. Anyone who prefers to see results in feet should convert from meters to feet, using the following simple math expression:

SELECT 	artwork1.artwork_name,
		artwork2.artwork_name,
		ST_DISTANCE(artwork1.where_is, artwork2.where_is) * 3.2808399
FROM 		artwork artwork1,
		artwork artwork2

Artwork1	Artwork2	Distance (meters)
"Giaconda"	"Giaconda"	0
"Giaconda"	"David"		2915185.37043353
"Giaconda"	"Sunflowers"	2251773.82097004

We Don't Just Store Points: Lines, Polygons, Multipoints

The geography data type has a modifier that specifies what kind of element we will store in the column. In our first example, we didn't specify the modifier, so by default we stored a POINT. However we can store LINES, POLYGONS, and even complex spatial forms like MULTIPOINTS (an array of points). For example, if we want to store a polygon to represent the perimeter of a museum, we can create the following table:

museum table

CREATE TABLE museum (
	museum_name 	VARCHAR(20),
	country 		VARCHAR(20),
	perimeter		GEOGRAPHY(POLYGON)
);

To create a polygon value, we will use the PostGIS function ST_GeoFromText. As we've already mentioned, this returns a geometry value when it is given a WKT representation. The WKT for a polygon has the following form:

'POLYGON((-3 40,-3 41,4 41,4 40,-3 40))'

Note that the first point and the last point are the same; to define a polygon, it is mandatory to use a closed multiline. The next SQL inserts define museum records with its perimeters:

INSERT INTO museum VALUES ('Munich Pinakotek','Deuschtland',ST_GeogFromText('POLYGON((11 48,11 49,12 49,12 48,11 48))'));
INSERT INTO museum VALUES ('Accademia Gallery','Italy',ST_GeogFromText('POLYGON((11 43,11 44,12 44,12 43,11 43))'));
INSERT INTO museum VALUES ('Musee du Louvre','France',ST_GeogFromText('POLYGON((2 48,2 49,3 49,3 48,2 48))'));
INSERT INTO museum VALUES ('Museo Reina Sofia','Spain',ST_GeogFromText('POLYGON((-4 40,-4 41,-3 41,-3 40,-4 40))'));

Note: The perimeters are not strictly real, but they include the museum area.

Calculating the Area of Complex Shapes

Now that we have a table with a polygon defining the perimeter of each museum, we can use the ST_AREA function to obtain the area of every museum. We'll do this by using a really simple query.

The interesting point here is the simplicity. Think how complex it could be to calculate the area of an irregular shape. Here we are doing it with a nice short query:

SELECT	museum_name,
		st_area(perimeter)
FROM museum
ORDER BY 2 DESC

museum_name		st_area (in square meters)
"Museo Reina Sofia"		9412789265.14431
"Accademia Gallery"		8985388059.43758
"Munich Pinakothek"		8217590765.89231
"Musee du Louvre"		8217590765.89229

Note: Museum perimeters are estimations, so area values are approximated.

Getting to Grips With Join Conditions

If we review the schema of the museum and artwork tables, there are no foreign keys that allow a join. How we can relate a piece of art with the museum where it is being displayed? It there any spatial condition to join both tables ? Let's take a look to the following SQL join:

SELECT 	artwork_name,
		museum_name
FROM 	museum,
		artwork
WHERE 	ST_INTERSECTS(where_is, perimeter)

Artwork_name		Museum_name
"Sunflowers"		"Munich Pinakothek"
"David"			"Accademia Gallery"
"Giaconda"		"Musee du Louvre"
"Guernica"		"Museo Reina Sofia"

Try It Yourself: Out Interactive PostGIS Course

Spatial databases do have more complex elements, functions and algorithms, but the basic concepts we've discussed are present behind every spatial element. If you want to learn how to work with maps in SQL, try out our PostGIS interactive course!