You are here: Home / Research / Research Blog / 5/20/13 - "The Application of SOM-Artificial Neural Networks, PCA and GIS Technologies for the Characterization of Human Health Risk to Petrochemical Pollutants in the Niger Delta Region" by Richard Olawoyin, Ph.D.

5/20/13 - "The Application of SOM-Artificial Neural Networks, PCA and GIS Technologies for the Characterization of Human Health Risk to Petrochemical Pollutants in the Niger Delta Region" by Richard Olawoyin, Ph.D.

The Application of SOM-Artificial Neural Networks, PCA and GIS Technologies for the Characterization of Human Health Risk to Petrochemical Pollutants in the Niger Delta Region

by Richard Olawoyin, Ph.D.

Richard Olawoyin, Ph.D., Energy and Mineral Engineering, received an ARC research grant in Fall 2011 for his project entitled "The Application of SOM-Artificial Neural Networks, PCA and GIS Technologies for the Characterization of Human Health Risk to Petrochemical Pollutants in the Niger Delta".

1.0 Introduction

The Niger Delta (Fig. 1) is located in the South-South region of Nigeria, western Africa with a total area of 7,722.04 square miles, and ranges between 4o151N and 4o501N and 5o251E and 7o371E (Powell et al., 1985). The population of the region is about 31 million (CRS, 2008), Annual mean temperature is estimated to be 80°F and annual average rainfall is 3000 mm (heavy rainfall due to proximity to the equator) (Akintola, 1982).The dynamism of marine sediment buildup and fluviatile activities during the upper Cretaceous, led to the formation of the Niger Delta, which is typified by widespread interconnectivity of, deltaic tributaries, mangrove swamps, flood plains, creeks, and coastal barrier islands. The soils in the region are of fluvial origin, characteristic of the back-swamps soils include peat covered water-logged weighty clay whereas clay and silty loamy soils are found in elevated areas (Rahaman, 1976). Samples were collected from five different regions in the area; Bonny, Eriemu, Odidi, Ugehlli and Warri (Fig. 2).


Fig. 1 Map of study sites showing the Niger Delta Area NDA geopolitical boundaries..jpg

Fig. 1: Map of study sites showing the Niger Delta Area (NDA) geopolitical boundaries.

Fig. 2 Sample locations, BN-Bonny ER-Eriemu, OD-Odidi, OG- Ughelli, WR-Warri River..jpg

Fig. 2: Sample locations, BN-Bonny; ER-Eriemu, OD-Odidi, OG- Ughelli, WR-Warri River.

Environmental media such as water, soil and sediments are susceptible to contaminations from various sources and substances, consequently leading to complexities in risk assessment decision making and management. Integrated quality assessment measures of these media using comprehensive procedures that entail the toxicity and toxicity identification, bioaccumulation, biomagnifications, persistence, chemistry, evidence of effects and physical property evaluation of these pollutants are important (Martín-Díaz et al.,2004; Chapman and Hollert, 2006; Chapman, 2007). The provision of significant integrated interpretations using high powered computing and artificial intelligence in assessing pollution levels is essential in setting priorities for mitigations and management measures that are needed for site restoration.

The anthropogenic impacts of industrial activities have been appraised by employing the artificial neural network self-organizing maps (ANN-SOM) learning algorithm for the identification and interpretation of the relationships between the measured variables and collected river sediment samples, improving the knowledge of contamination sources and potential harm to humans (Marengoet al., 2006). SOMs have been used in image classification (Lu, 1994), speech recognition (Kohonen, 1988), clustering of documents (Honkela et al., 1998). Comprehensive references to the SOM techniques can be found in Kohonen, 1985; Kohonen et al., 1995.

This research focus was on the valuable and suitable capability and application of the ANN-SOM technique for the classification, interpretation and visualization of water, soil and sediment data. This was vital in analyzing the contaminants concentrations, bioaccumulation, toxicity and also assess the quality of the sampled materials.

1.1 Data Set

The datasets analyzed in this study include 14 physico-chemical variables (SO4, PO4 , Zn, Cd, Cr, Cu, Pb, Ni, Mn, Fe, the sum of 7 carcinogenic polycyclic aromatic hydrocarbons (PAHs), the sum of 10 non-carcinogenic PAHs, sum of total petroleum hydrocarbon , sum of benzene, toluene, ethylbenzene and xylene (BTEX) , and 2  toxicity parameters (pH and EC). The heavy metals were analyzed using extraction methods consistent with US EPA method 3050B. PAHs were treated using mass spectrometry detection (MSD) (González-Piñuela et al., 2006). Details on sample collection are presented in previous chapters and also described by Viguri et al., (2007); Olawoyin et al., (2012). The characteristics of the PAHs analyzed and the abbreviations are presented in Table 1.

Table 1: Priority PAHs -  Characteristics in Sampled Media



Benzene ring


Mol. Wt. (g)

Solubility at 25°C (µg/L)






12.5-34.0 b











3.42 b



















































































BTEX - Benzene, Toulene, Ethylene and Xylene






Classification for carcinogens

2A Probable human carcinogens; 2B. Possibly carcinogenic to humans (Known or limited evidence in humans or adequate evidence in animals but insufficient evidence for humans); D. Non classifiable as carcinogenic to humans

a. Types of carcinogen from Watson and Dolislager; b in mg/L; IARC, 1987

1.2 Self-Organizing Map (SOM)

The Mathworks software (Matlab®) and SOM Toolbox version 2 was used for the SOM analysis (Vesanto et al., 1999). The input data was the measured variables and the values for all locations. The SOM tool was applied to project the input data (with multi-dimensions) into 2-dimesional lattice structure by going through a training phase and also preserving the topological features in the input data space. The methodology of the SOM involves the arrangement of neurons on a 2-D grid, where each neuron is associated with a prototype weight vector that carries the resemblance of the input data variables. The SOM used in this study was trained using the batch training algorithm as described in Kohonen (2001).

The methodology used in this study is presented in Fig. 3. SOM provides effective results which are easily visualized and interpreted from the generated component planes (CPs) and maps. In this study, based on the variables measured from the samples collected from the locations, samples in the same unit will show more similarities and represented closer on the map, while samples with different patterns are located far away from each other. Units with same weight vectors are represented on the same CP, therefore producing CP equal to the number of the data variables. Preliminary assessment of the CPs explains the pattern embedded in the data and how the values spread in the input space.

Fig. 3 Methodology of the study using SOM-ANN.jpg

Fig. 3:  Methodology of the study using SOM-ANN

2.0 Result and discussion

2.1 Samples Classification with SOM

Crude oil is a complex mixture of chemicals and other substances, primarily hydrocarbons and porphyry of metals and also organic substances. Sediment mineralogy and chemistry are essential for pollution assessment and evaluation. Sediments usually contain large amounts of fine grained clay minerals, which are known to transport pollutants by adsorption, and/or complex exchange at the clay-water interface. Hydrocarbon accumulations are known to be sediment dependents and the levels of contamination can be visualized using the SOM component planes (c-planes). The SOM component planes of the input variables for the sediment samples are illustrated in Fig. 4. The self-organizing map of each variable corresponds as presented in the figures to the map of the sample locations in Fig. 2. Each hexagonal unit on the map found at a particular location on the different component planes has the same location on the unit map. The values of the different components are represented using different colors with the scale on the right of each component map.

The unified distance matix (U-matrix) as presented in the SOM output provides the visualization of the relative distances between the neurons. Color differential is effectively used to show the calculated distance differences between adjacent neurons. A lighter color on the u-matrix indicates the closeness of the vectors in the input space while darker colors represent larger distances between vector values in the input space. The u-matrix also helps to identify clusters in the datasets. The SOM procedure using the u-matrix provides faster knowledge based interpretation of the input dataset distributions.

The c-planes of the input variables for the sediment as shown in Fig. 4 classified the different variables and due to the minimal effects of gas flaring in the areas on sediment chemical composition, low values of pH, NO3 and PO4 were recorded in the sampled area. There are increases in the area of contaminations with high pollutant volumes and especially hydrocarbon concentration (Rabalais et al., 1992).Towards the South Western (SW) parts of the study areas (Fig. 4), sediment contaminations were determined to be minimal as opposed to the northern areas with high concentrations of TPHs, PAHs and BTEX. PAHs and BTEX concentrations were observed to be spatially identical suggesting the possibility of a common source of contaminations in the area, the maximum values of PAHs and BTEX distribution in the sediments were at western points marked by extensive petrochemical activities, and these values decrease towards the South Eastern (SE) areas but increase in the opposite direction to all other parts. The TPHs concentrations are higher for the northern portion of the map with moderate to low values around the center. This phenomenon suggests that some level of remedial actions have been carried out in some residential areas in this region with high human population.

Fig. 4 c-planes of sediment variables classified using the SOM.jpg

Fig. 4: c-planes of sediment variables classified using the SOM

2.2 SOM interpretation

The analysis of the samples in the study with organic pollutant variables was also presented to the SOM. For the soil  and the ‘log’ normalization identified the best map quality with a 100-uint map size (i.e. 20 X 5) and with TE = 0.031, QE = 0.002 for the, and a 125-unit map size (i.e. 25 X 5) with TE = 0.000, QE = 0.668 (Table 4) for. While for the soil  the ‘range’ normalization gave the best map quality with a 100-uint map size (i.e. 20 X 5), TE = 0.000, QE = 0.136 (Table 5). The number of neurons (n = 100) are comparatively close to the number of samples (n = 98). Visualization of the c-planes for the soil  was helpful in the interpretation of the datasets. The soil showed similar distribution trends in the entire area for the individual carcinogenic PAHs. Elevated values of  representing high molecular weight (HMW) PAHs which are usually adsorbed on particles or present in oil droplets were found to be present in soils in the entire area except for the upper North Western (NW) part which showed lower values which could be potentially due to long proximity to the pollution sources closer to the southern parts on the study map.

Crude oil dispersed in the water bodies contaminates the water systems since it contains a mixture of a variety of substances such as high concentrations of organic compounds. The water then contains these pollutants in droplet or in dissolved phases. Due to the low solubility of aliphatic hydrocarbon compounds, they are usually present in water in the dissolved phase while aromatic hydrocarbons such as the PAHs are found in either of the phases depending on the molecular weight (MW) of the organic compound. BTEX are LMW aromatics and are moderately soluble in the water phase together with Nap. The HMW PAHs are mostly in the dispersed phase (OGP, 2005). The anomalous trends displayed in the SOM c-planes by water PAHs in the study areas, suggest possible oil contamination which could possibly have resulted from oil spills, deck wash, sabotaged pipelines and transportation related emissions. These trends also correlate with the pollutant concentrations and distributions in the sediment and soil samples with highest values at the southern part of the map and the concentration gradient is steep within these areas. BaP is the most toxic carcinogenic PAHs; it is contained in relatively higher levels throughout the southern parts posing significant carcinogenic risks to the residents of these areas.

2.3 SOM Interpretation for the Individual Sites

Analyses of soil PAHs for each of the sampled sites were also carried out using the input data for soilpresented to the SOM.  The map unit sizes for the four locations are; BN {= 6 X 4; 24 units; using ‘range’ normalization, TE = 0.040, QE = 0.077}, ER {= 5X 4; 20 units; using ‘range’ normalization, TE = 0.000, QE = 0.151}, OD {= 7 X 4; 28 units; using ‘log’ normalization, TE = 0.000, QE = 0.637}, OG {= 8 X 4; 32 units; using ‘log’ normalization, TE = 0.000, QE = 0.674}. The datasets were trained for individual sample locations as to have a detailed understanding of the local trend in the datasets. The BN location is characterized by high values of soil which corresponds to the location of the crude oil terminals towards the southern parts of the sampled area. The trend displayed by the soil at BN location indicates decreasing concentrations from pollutant source from the SE towards every other direction on the map. This trend is consistent with the combined soil carcinogenic PAHs analyses for all location. While for the ER location, the SOM visualization using the c-planes shows expected trends as majority of the petrochemical industries in the ER area are located between the center and towards the southern regions. The were distributed from the center towards the south of the map and these high values of in the soil of the area are imminent treats to the health and safety of the residents. The OD location is largely characterized by oil spills and waste discharges from petrochemical activities. The show pattern of higher contaminations towards the south which is consistent with other datasets analyzed. BaP, BkF and InP were determined to have the highest concentration values in the area which suggest that the risk of cancer induction for residents in this area especially those living in the southern parts is potentially high. This trend is identical to the trend observed from the OG SOM visualization. The major difference is that for the OG location, the c-plane for the total  showed uncontaminated areas on the North East (NE) area of the map against the North West (NW) uncontaminated sites established at the OD location.

3.0 Summary

The physico-chemical properties (pH, TPH, BTEX, PAH, COD, SO4, PO4, NO3, and heavy metals) in the recipient environment of Bonny, Eriemu, Odidi and Ughelli (Warri) were assessed in sediments, soils and water. The SOM was used as a powerful visualization tool to identify trends in the dataset. Areas with high concentrations of pollutants were easily identified from the c-planes which revealed vital information for the interpretation of the results. Preliminary diagnosis of the quality of locational sediment, soil and water can be effectively carried out using the SOM algorithm to develop the c-planes. The physical, ecotoxicological and chemical features embedded in the datasets common to different locations sampled were easy to identify using the SOM c-planes and the most prevalent contaminants were identified for the different locations which would aid in remedial planning and decision makingFrom the result of the SOM dataset processing, it was observed that majority of the sites were contaminated with carcinogenic PAHs and carcinogenic heavy metals which are of concern due to the effects on human health. Composite risk index maps of the areas were also developed to validate the result obtained in this study (Figs. 5-8). Comprehensive remediation and mitigation plans are recommended for these areas. Furthermore, the provision of effective health care facilities that can evaluate the health conditions of the residents in the high contamination zones identified by the SOM and the provision of urgent care needed to those severely affected by the chronic exposure to these pollutants are also recommended. Therefore, these stations from the Niger Delta should be classified as highest priority sites regarding heavy metals and PAHs carcinogenic pollution when considering remediation decisions.


Fig. 5 Composite risk map of the BN Location, Niger Delta, Nigeria.jpg

Fig. 5: Composite risk map of the BN Location, Niger Delta, Nigeria

Fig. 6 Composite-Risk Map of the ER Location, Niger Delta, Nigeria.jpg

Fig. 6: Composite-Risk Map of the ER Location, Niger Delta, Nigeria

Fig. 7 Composite-Risk Map of the OD Location, Niger Delta, Nigeria.jpg

Fig. 7: Composite-Risk Map of the OD Location, Niger Delta, Nigeria

Fig. 8 Composite-Risk Map of the OG Location, Niger Delta, Nigeria.jpg

Fig. 8: Composite-Risk Map of the OG Location, Niger Delta, Nigeria


Akintola FA (1982) Geology and Geomorphology, Nigeria in Maps, In: Barbour K. M et al. (Eds.), Hodder and Stoughton, London, 1982, p. 209.

Chapman P.M. (2007). Determining when contamination is pollution – weight of evidencedeterminations for sediments and effluents. Environ Int; 33:492–501.

Chapman P.M, Hollert H. (2006). Should the sediment quality triad become a tetrad, apentad, or possibly even a hexad? J Soils Sediments; 6:4–8.

Congressional Research Service (CRS). (2008) Nigeria: Elections and issues for Congress Updated.RL33964. Nigeria: Current Issues. Retrieved from

González-Piñuela, C., Alonso-Salces, R.M., Andres, A., Ortiz, I., Viguri, J.R., (2006).Validated analytical strategy for the determination of polycyclic aromaticcompounds in marine sediments by liquid chromatography coupled withdiode-array detection and mass spectrometry. Journal of Chromatography A1129, 189e200.

Honkela, T., S. Kaski, T. Kohonen, and K. Lagus. 1998. Self-organizing maps of very large document collections: Justi®cation for the WEBSOM method. L. Balderjahn, R. Mathar, and M. Schader. (eds.)Classi®cation, Data Analysis , and Data Highways. 245±252. Berlin: Springer

IARC (1987) Overall Evaluations of Carcinogenicity: an Updating of IARC Monographs Volumes 1 to 42. In. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, supplement 7. International Agency for Research on Cancer. Lyon, France.

Kohonen, T. 1985. The Self-Organizing Map. Proc. IEEE, 73:1551–1558.

Kohonen, T. 1988. The neural phonetic typewriter. Computer, 21(3):11–22.

Kohonen, T., Hynninen, J., Kangas, J., and Laaksonen, J. (1995). SOMPAK: The self-organizing map programpackage, Helsinki University of Technology, pak.

Kohonen, T. (2001). Self-Organizing Maps. Berlin, Springer.

Marengo E, Gennaro MC, Robotti E, Rossanigo P, Rinaudo C, Roz-Gastaldi M. (2006). Investigation of anthropic effects connected with metal ions concentration,organic matter and grain size in Bormida river sediments. Anal Chim Acta;560:172–83.

Martín-Díaz M.L, Blasco J, Sales D, DelValls T.A. (2004). Biomarkers as tools to assess sediment quality. Laboratory and field surveys. TrAC Trends Anal Chem;23:807–18.

OGP (2005), Fate and effects of naturally occurring substances in produced water on the Marine Environment. OGP report number 364, 36 pages

Olawoyin, R., Oyewole, S.A.., Grayson, R. L.(2012) Potential risk effect from elevated levels of soil heavy metals on human health in the Niger delta, Ecotoxicol. Environ. Saf., Volume 85, 1 November 2012, Pages 120–130   .

Powell CB, Ibiebele DO, Bara M, Dutkwicz B, Isoun, MO (1985) Oshika Oil Spill Environmental Impact; effect on Aquatic biology. NNPC/FMHE International Seminar on petroleum industry and the Nigerian Environment, (pp. 168 – 178.). Kaduna, Nigeria.

Rabalais, N.  N., B.  A.  McKee, D.  J.  Reed, and J.  C.  Means .  (1992) .  Fate and effects of produced water discharges in coastal Louisiana, Gulf of Mexico, USA, pp.  355-389. In J.  P.  Ray and F.  R.  Engelhart, Produced Water, Plenum Press, New York.

Rahaman MA (1976) Review of the basement geology of Southwestern Nigeria: In Geology of Nigeria, edited by Kogbe, CA, Lagos: Elezabetha Publishing Company

Vesanto, J., Himberg J., Alhoniemi E. and Parhankangas J. (1999). Self-organizing map in  Matlab:  the  SOM  Toolbox. Proceedings  of  the  Matlab  DSP  Conference, Espoo, Finland, Comsol Oy.

Viguri J. R, Irabien M J, Yusta I, Sotoc J, G´omezc J, RodriguezdP, Martinez-Madride M, Irabiena J A, Coza A, (2007) .Physico-chemical and toxicological characterization of thehistoric estuarine sediments: a multidisciplinary approach.Environment International, 33(4): 436–444.

Watson, A.P. and Dolislager, F.D., (2007). Reevalution of 1999 Health-Based Environmental Screening Levels (HBESLs) for Chemical Warfare Agents. ORNL/TM-2007/080.

The complete publications relating to this research work are available at the following references:

  • R. Olawoyin, A. Nieto, R. L. Grayson, F. Hardisty and S. Oyewole (2013). Application of Artificial Neural Network (ANN) – Self-Organizing Map (SOM) for the Categorization of Water, Soil and Sediment Quality in Petrochemical Regions. Expert Systems with Applications Vol. 40, Issue 9, July 2013, Pages 3634-3648
  • R. Olawoyin, (2013). Exploration of the Spatial-Composite Risk Index (CRI) for the Characterization of Toxicokinetics in Petrochemical active areas. Chemosphere (Accepted April 16, 2013)
  • R. Olawoyin; R. L. Grayson; O. T Okareh. (2012)Eco-toxicological and Epidemiological Assessment of Human Exposure to Potentially Petrogenic Polycyclic Aromatic Hydrocarbons in the Niger Delta, Nigeria. Toxicol. Environ. Health. Sci. Vol. 4(3), 173-185, 2012
  • R. Olawoyin, S. A. Oyewole, R. L. Grayson, (2012). Potential risk effect from elevated levels of soil heavy metals on human health in the Niger delta, Ecotoxicol. Environ. Saf., Vol. 85, 1 November 2012, Pages 120–130
  • R. Olawoyin R. L. Grayson and A. Nieto (2013) Characterization of Potentially Petrochemical Toxicants Using Chemometrics: A case study. Sci. Total Environ. STOTEN-D-13-00031 (under review)
  • R. Olawoyin, R. L. Grayson, O. T. Okareh, A. Nieto (2013). Characteristic Fingerprints of Polycyclic Aromatic Hydrocarbons and Total Petroleum Hydrocarbons in the Niger Delta, Nigeria, Environmental Earth Sciences ENGE-D-12-00620R1 (Under review)