 The phrase “80% of data is geographic” is one of those commonly cited facts that those that work with GIS data are very familiar with.  The phrase almost always presented without any reference to its originations, repeated over and over again.  It is frequently cited to underpin the vast untapped GIS data potential out there.  Embraced as the perfect angle for convincing agencies and companies to adopt geospatial technologies, GIS powerhouses like Esri have cited it every which way on its pages: 80% of the world’s data includes some kind of spatial aspect (here), 80% of data has a location component (here), 80% of data possesses a geographic reference (here), 80% of transactional data has a location component (here).

The Urban Legend of GIS Data?

So where did this phrase originate from and what evidence is it based on?  A recent post on Spatially Adjusted about the phrase made me want to dig deeper.  Answering that definitely isn’t quite so simple given the folkloric status the phrase has reached . There are several sources of origination that have been offered up.  Some pointed to Franklin, Carl and Paula Hane, “An introduction to GIS: linking maps to databases,” authored by Carl Franklin and Paula Hane (1). as the originating source (examples:  gis.stackexchange.com and Spatial Sustain).  That article took at look at the emerging field of what the authors called Geographic Information Management (GIM) and the “impact of computerization of maps on access to business and government information that may be geographically referenced.”

That then drilled down into referencing a 1990 report from the Ohio Geographically Referenced Information Program (OGRIP) that I have not been able to location.  However, the phrase “80% of data collected, stored, and maintained by local governments includes some reference to geography.” has been used repeatedly in reports issued by the agency since.  A draft white paper from 2004 and 2011 on the “Ohio Location Based Response System” both repeat that phrase.  Since both papers don’t cite any of the “studies” referenced, the trail dies there.

However, the earliest date is an article written 1987.  In “Analytic Mapping and Geographic Databases”, Issue 87, published in 1992 and edited by Robert S. Biggs, G. David Garson, the authors make the statement, “Computer mapping is particularly important in government, and hence is salient to social scientists who study government policies.  It is estimated that 80% of the informational needs of local government policymakers are related to geographic location.”

Biggs and Garson cite an article written by Robert E. Williams in 1987 entitled “Selling a geographical information system to government policy makers.”  At the time of the publication, Williams was the Director of the Alachua County Regional Information Center.  The article was published in “Papers from the 1987 Annual Conference of the Urban and Regional Information Systems Association” by URISA.

While a copy of the article is not available online, the abstract of the article is available from Esri’s online bibliography:

One of the most important elements of a GIS installation is obtaining approval and support of the policy makers who will fund the project. Because of the encompassing nature of a GIS, there is no one or major user of the system; therefore, it is imperative that a champion for the project is identified. (A champion is used here to refer to the individual or department that will take charge of the project and be responsible for all aspects of this implementation.) That champion must then gain the support of all users and sell the concept of the system to the policy makers. A case study of the selling of a GIS called GEOMAX at Alachua County will be used to show how a comprehensive GIS was effectively sold to three separate policy making bodies through an effective realtime demonstration. The demonstration was tailored to meeting the concerns of the policy makers and not to the technical features of the system.

I was able to procure a scanned copy of the article from Wendy Nelson, the Executive Director of URISA. The exact statement appears on page 151 of the publication and states in the second paragraph:

Automated mapping is probably an easier sell because, again, the policymakers are cognizant of the need for improved mapping capabilities. It has been estimated that approximately 80% of the informational needs of a local government policymaker is related to a geographical location. This information is usually supplied by a map rendering, e.g., maps showing the location of a parcel of land being considered for a rezoning petition.

The article by Williams lists no sources or any indication where the number comes from.  However, a little digging into GEOMAX reveals that the program was developed in 1985 by two academics at the University of Florida in 1985.  John Alexander, a professor of urban and regional planning, and Paul Zwick, a research scientist were behind the effort to digitize maps at Alachua County so perhaps the knowledge of where the phrase originates lies with them?

Testing if 80% of Data is Geographic

There have been a couple of articles produced by a team of German researchers that have attempted to test this statement.  In a paper presented at Agile 2011,  Stefan Hahmann and Professor Dirk Burghardt, a professor of Cartographic Communication at the Dresden University of Technology (Technische Universität Dresden) in Germany, along with Beatrix Weber presented a research framework for testing out the validity of the phrase in a paper entitled ““80% of All Information is Geospatially Referenced”??? Towards a Research Framework: Using the Semantic Web for (In)Validating this Famous Geo Assertion.” The article noted that while the phrase is referenced repeatedly in even academic articles, no paper has provided any methodology to demonstrate this statement.

In an 2012 article in press with International Journal of Geographical Information Science, “How much information is geospatially referenced? Networks and cognition,” the German university academics attempted to test the theory that 80% of information has a spatial component by looking at German Wikipedia articles.  The article took two approaches to analyzing scientifically the statement.  The first approach was to look at German Wikipedia articles as a network, with articles as nodes and links within the articles as “edges of a directed graph.”  The second approach was cognitive.  Articles were categorized as having a “direct geospatial reference”, “indirect geospatial reference” or “no geospatial reference.”  The network approach found that 78% of articles were either tagged with geographic coordinates or linked to an article with tagged coordinates.  The cognitive approach found that percentage to be closer to 60% at 57%.  The article is in German but there is a summary in English near the beginning.

The end of Kahman, Burghardt, and Weber’s article indicates real value of the phrase, citing aquote on Twitter by John Fagan, Head of Software Engineering, Axon Active AG. Agile & Lean and formerly of Bing Maps and Multimap: “that geo quote keeps us all in our jobs. Best not go poking around to see if it’s true.”

