dbowdalo · 99727f55
--- a/Home.md
+++ b/Home.md
+## Outline
+**GHOST** (**G**lobally **H**armonised **O**bservational **S**urface **T**reatment) is a project dedicated to the harmonisation of global surface atmospheric observations and metadata, for the purpose of facilitating a greater quality of observational/model comparison in the atmospheric chemistry community.
+Currently data for >100 measured gas/aerosol components from the EBAS/EEA AQ e-Reporting networks has been processed under the GHOST framework (from 1970-present day). Efforts are currently ongoing to process data from AIRBASE/CAPMON/CASTNET/EANET/EPA AQS/NAPS/SEARCH/WMO GAW. There are also future plans also to process meteorological measurements.
+## Why Use GHOST?
+For evaluation of their models, modellers rely on observational data. However, a large number of different observational networks exist, providing data in a plethora of formats and differing levels of detail. Owing to the complexities of combining data from multiple networks, modellers often use data from one or a couple of networks when evaluating their models. On the occasion that data from multiple networks is used, there is typically little to no detail given about the methodology used in combining data/metadata from the different networks, or regarding the quality assurance (QA) or station classifications employed to subset the data. Therefore evaluation efforts from different groups, which handle observations in differing manners, are often incomparable.
+The central concept of the work in GHOST is an effort to standardise the data/metadata from all major public reporting networks providing atmospheric surface measurements. Each processed measurement is additionally associated with QA/classification flags which pertain to a plethora of documented quality control checks/metadata groupings, providing users a way to subset data in a flexible and reproducible manner. In this way also, any subsets of observations used in model evaluation efforts can be traced directly back to a documented project, and cross-group evaluation efforts can be directly compared.
+## GHOST Framework Details
+All implemented details associated within the ingestion of network data within the GHOST framework have been designed with the ultimate goal of providing the best quality of standardised data/metadata detail. Some the major considerations and efforts made in the processing of the multiple networks’ data are described here. 
+On the most fundamental level, all relevant metadata fields are standardised (e.g. alt/abs_alt/height == altitude). All units are standardised on the fly (utilising provided temperature and pressure for conversions where necessary). 
+Measurements often come in a variety of temporal resolutions. Data of all native temporal resolutions are handled, resolving duplicated/overlapping measurements, and then gridding onto consistent hourly, daily and monthly resolutions.
+One issue often ignored is the assumption of static metadata. Just as observations fluctuate over time, so does the metadata outlining the measurement conditions employed. Some examples of this are: a measurement location being moved, a new instrument being used, the instrumental limits of detection changing, etc. Within GHOST, all metadata fields are handled as dynamically changing variables though time. This allows all measurements to be associated with metadata that is accurate to the exact time of measurement. In the case of a station measurement location moving significantly (defined as greater than 11m horizontally or vertically in space), for the sake of maintaining consistent time species, a new station will be created from the point of the significant movement of the measurement location.
+Networks typically provide metadata classifying measurement stations. This can be the type of location they are situated in (e.g. urban/rural etc.); the dominant type of air the station typically sees (e.g. traffic/industrial etc.); or the dominant land use around the measurement station, etc. A significant effort has been made to standardise all the different classification types provided across the different networks. 
+Although network provided classifications can be very useful, they are fundamentally subjective in how they are defined, e.g. what one network defines as “urban” may differ from what another network defines it as. Although a significant effort has been made to standardise these classifications cross-network, there are some instances where there will be slight nuances between the classifications grouped together from the different networks. Globally gridded products provide an alternative method for the classification of stations, with the advantage of the classes being entirely globally consistent. Multiple frequently used globally gridded products are processed within GHOST by station, these include: GSFC coastline proximity, GPW population density, ETOPO1 altitude, MODIS MCD12C1 v6 IGBP land use, NOAA-DMSP-OLS v4 average nighttime stable lights, Koppen-Geiger WorldClim Classifications, UMBC modal anthrome classification, ESDAC modal Iwahashi landform classification, etc.
+The measurement procedures utilised when measuring a component can impose undesirable biases, dependent on the component and the circumstance of measurement. A vast effort has been undertaken within GHOST to compile a library of standardised measurement methodologies for all processed components, documenting for each standard measurement method the specific names of all reported measurement instruments of that type in the observational metadata, with the associated measurement specifications per instrument (i.e. detection limits, flow rates, instrument drift etc.). Through this exhaustive documentation effort, the measurement process is able to be standardised within GHOST. A problematic case to handle typically is when a measurement station changes the measuring instrument. However, as the GHOST framework allows metadata to vary in time, all specific measurements can be associated with accurate measurement specifications.
+Another tricky case to handle is the measurement of the same component with multiple different methodologies at a single station (often in the same room). Usually, as measurements are typically referenced by a unique station reference code, just one set of these observations are kept (typically the set with the longest data coverage, or using the preferred measurement methodology). As all measurements within GHOST are associated with a standardised measurement methodology, rather than solely referencing measurements by a station reference, measurements are referenced with a station reference code + standard measurement methodology code, allowing these multiple sets of measurements to all be kept. In cases where there are multiple measurements using the same methodology, but multiple different instruments, an extensive algorithm is used to systematically choose which set of measurements to keep (weighting by consistency of metadata, temporal resolution and data coverage).
+All measurements are ran through stringent quality control checks. Each check is associated with one or several unique QA flag codes. Any specific measurements found to be of suspicion within the checks are associated with the appropriate QA flags. Each measurement can be associated with many QA flags. Quality control checks include: basic checks (negative values, zero/infinity values), violation of detection limits check, recurring values check, coarse measurement resolution checks, extreme data checks (fixed limits per species + adjusted boxplot screening + manually flagged), non-integer timezone checks, etc.
+Additionally, as well as associating each measurement with QA flags, each measurement is associated with classification flags. The standardised network provided metadata and globally gridded metadata are both utilised to produce useful quick default classifications that are frequently used (e.g. high-altitude stations == metadata classified mountain stations + stations with measurement altitudes metadata >= 1500m from mean sea level). As with the QA flags, each measurement can be associated with many classification flags. Other classifications include: high altitude (global terrain metadata determined), near coast (global metadata determined), urban representative (station metadata derived), urban representative (global metadata derived), etc.