## Introduction to Statistics and Geographic Data

## Measurement

**Accuracy and Precision**

Accuracy is a degree of closeness to the true value. An example of accuracy includes a GPS providing coordinates within 0.1 cm of the actual location. Precision is the reproducibility of a measurement. Systematic error produces high precision but low accuracy. An example of good precision and low accuracy is measuring in GIS with the wrong projection. You can be highly precise, but your measured distance will be wrong, as the GPS systematically returns the same coordinates for a location, but the coordinates are for a location 5 feet west of that location. Random error negatively impacts precision.

**Validity**

Is the unit of measure or the type of measurement valid for what you are applying it to? Validity is a measure of how appropriateness of assumptions and relationships. For example - Is population density of cattails a good indicator of wetland health? Another good example question that tests validity is: Does elevation influence the amount of precipitation?

**Reliability**

Reliability is a measure of the consistency of data. Reliable data are generally high precision data. Questions to ask about data reliability include: “Does the measure of the data remain reliable over time?” and “Are the measuring techniques reliable?” An example of reliable data is the LANDSAT dataset, because it uses consistent imagery collection techniques over the past 30 years. Another reliable dataset is the U.S. Geological Survey (USGS) water level measurements presented in the NWIS database. The USGS Very specific protocol when collecting and storing water level data.

Error

Error is the difference between the measured/provided value and the actual value. Error is always present in real data. There are several ways of introducing possible error. Error can be systematic or random. Systematic errors generally have high precision and can be corrected with relative ease. Random errors can be any random value and are much harder to correct.

Calculation error is incorrect arithmetic, such as not adding th0e right numbers together. Calculation error is commonly introduced during unit conversion processes, like when the Mars Climate Orbiter crashed from failure to convert from English to S.I. units. Calculation error can also be caused by faulty computer programming.

Measurement error occurs when the measuring device is broken or improperly. Often, a poorly calibrated instrument will produce good relative measurements but poor absolute measurements, which would lead to systematic error. If an instrument is broken, it could produce random measurements, which would be a random error. Measurement error can also occur if an operator is using an instrument incorrectly.

Specification error is caused by incorrect assumptions. A specification error is applying the wrong model to a set of variables. Incorrect assumptions include incorrect applications of formulas or incorrect independent variables for a model. Specification error is a result of poor validity.

Sampling error is variation in the sample set from the population. It can be caused when the sample set is not entirely representative of the entire population. This error is reduced by good sampling practices (discussed later), such as random sampling and sampling an adequately large data set.

Random noise is random unexplained variations in data. It is stochastic and represents the unknown components of the measurement. Electrical signals usually have random noise, as do well water levels.