Multivariate Statistics Applied to the Identification of Compositional Control Parameters for Groundwater

The objective of the present study was to identify the most influent parameters in the composition of groundwater in the municipality of Icapuí, Ceará - Brazil, seeking correlations with the composition of the percolating aquifer formations that can be associated with the sources of these components. For this purpose, multivariate statistical techniques were applied by means of a Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA). The PCA allowed a reduction of physicochemical parameters and determined the two components responsible for approximately 86% of total variance in the data for both sampling periods (rainy and dry). The first component is represented by variables that indicate natural rock weathering processes, and the second comprises seasonality and pollution indicators. Samples were also correlated through HCA according to compositional similarities, which were associated with possible natural or human sources.


Introduction
Groundwater is a considerably important resource, particularly in regions such as northeastern Brazil, where there is an unbalance between water supply and demand.Rainfall variability is a natural driving factor in these types of regions, which when temporally poorly distributed, and in combination with high evaporation rates, decreases the supply of water especially in superficial reservoirs.
Groundwater is the only source of water supply in nearly 40% of Brazilian municipalities, particularly small ones where it guarantees safe water with low treatment costs (Villar 2016).In the semiarid region of the country, aquifers have great strategic importance, particularly in coastal areas, such as in Icapuí, state of Ceará.This municipality has groundwater as its only water supply, which makes it extremely important for the water security of local communities and to guarantee economic activities.Maia (2018) monitored the concentrations of nitrogen and microbiological compounds in groundwater in the municipality of Icapuí and correlated them with the cases of waterborne diseases.The results made it possible to attribute the presence of bacteria and nitrogen compounds to the increase in cases of diarrheal diseases, as well as the reappearance of cases of viral hepatitis.
Characterization and monitoring of the chemical composition of groundwater are indispensable when determining compatible and adequate uses with the qualitative conditions of this resource (Heibati et al. 2017;Molinari et al. 2019;Besser et al. 2019).Consistent space-time monitoring involves determining many water characteristics in several places and periods, which generates a large amount of correlated information (Bertossi et al. 2013).
Multivariate statistical analysis is an important tool when manipulating and interpreting data with many variables, since it allows to reduce the information contained in the original variables into a smaller set of statistical variables with minimal loss of information (Hair et al. 1998).
Gomes and Cavalcante (2017) used the techniques of Principal Component Analysis and Hierarchical Cluster Analysis to identify the similarity of the determinant variables of groundwater quality in the municipality of Fortaleza.They defined three components responsible for about 90% of the total variability of the data, the first component is indicative of pollution, the second of alkalinity and the third of salinity.
Thus, the aforementioned techniques were used in the present investigation with the aim of identifying the variables that most contribute to the physicochemical characteristics of groundwater, seeking to detect temporal and spatial variations caused by either human or natural factors associated with seasonality.Therefore, contributing to the understanding of a relationship between the composition of percolated rocks and groundwater in the municipality of Icapuí, Ceará -Brazil.

Study Area
The municipality of Icapuí is located in the eastern extremity of the state of Ceará (Figure 1), covers about 421 km², and has approximately 20,000 inhabitants (IBGE 2021).Water supply is carried out through tubular wells administered by the publicly owned Autonomous Water and Sewage Service (SAAE) and through private wells.
The climate is Tropical Warm Semi-arid, with an average temperature of 26 to 28º C, and maximums between 30 and 31º C. According to the historical series from 1988 to 2021, the average total annual rainfall is 894 mm, with the highest precipitation between January and June, averaging 840 mm, and only 54 mm of average rainfall between July and December (FUNCEME 2021).Thus, the first half of the year is defined as the rainy season and the second as the dry season.
According to Mente (2008), the region is inserted within the hydrogeological context of the Coastal Province, more specifically in the Potiguar sub-province, represented by the Dunas, Barreiras, Jandaíra, and Açu aquifers.
The Dunas Aquifer corresponds to wind deposits that occur across the whole coastline, with widths between 6 and 20 meters and is composed, according to Maia (2018), of well selected sands of fine to mean grain size.
The Barreiras Aquifer is found outcropping or subjacent to the Dunas Aquifer and has the greatest water potential in the area, which is widely used.It is formed by silt-clay fine-grained sandstones and red sandstones of mean to coarse sand sizes, with conglomerate levels (Sousa 2002).
The Jandaíra Aquifer is located at depths between 20 and 108 meters, with concordant upper contact with the Barreiras Aquifer and lower contact with the Açu Aquifer.It is mainly composed of limestones, marls, siltites, and dolomites (Morais et al. 2005;Vasconcelos, Teixeira & Alves Neto 2010).
The Açu Aquifer occurs under confinement conditions, caused by both the clay layers of the upper portion of the Açu Formation and by the base of the Jandaíra Aquifer, and can be detected at depths greater than 200 meters.It is composed predominantly of coarse to conglomeratic sandstones at its base, moving up to medium sandstones in its intermediate portion, and with finer sandstones at the top, showing a continuous vertical increase in the amount of clay (Morais et al. 2005).

Material and Methods
Data from 50 samples of groundwater collected from 25 wells in the municipality of Icapuí were used for physicochemical analyses (Figure 1).These samples were collected over the course of two campaigns, carried out in the months of February (rainy season) and August (dry season) of 2019, aiming to identify the influence of seasonality in the physicochemical characteristics of the water.The parameters analyzed included pH, electrical conductivity (EC), total dissolved solids (TDS), total alkalinity (TA), total hardness (TH), turbidity, major elements (bicarbonate, calcium, magnesium, chloride, sulfate, sodium, and silica), and minor elements (potassium, nitrate, fluoride, and bromide).X-ray fluorescence data was also obtained from four samples of the main aquifers in the area (Dunas, Barreiras, Jandaíra, and Açu) to identify the main oxides present and establish their proportions in each sample.This step was conducted seeking to associate possible origins to the constituents found in the groundwater.Sampling points were selected based on access to the sampling location and use of the water obtained, considering that 80% of the wells are intended for public supply.
The multivariate statistical analyses carried out were a Factorial Analysis, by means of PCA, and an HCA, using the SPSS Statistics software (version 17.0).The PCA regards the explanation of covariance structure through few linear combinations of the original variables being studied, with the objective of reducing the original dimension, facilitating the interpretation of the analyses conducted (Ferreira 1996).In general, the explanation for all the variability of a system determined by the variables can be done through its principal components (Buttafuoco et al. 2017;Calazans et al. 2018).The factorial analysis was initiated by transforming the original data matrix into a correlation matrix.The KMO rate (Kaiser-Meyer-Olkin Measure of Sampling Adequacy) was used to verify the adequacy of the data in the factorial analysis, which considers an analysis satisfactory with KMO > 0.5, in addition to Bartlett's Test of Sphericity, considered significant at p < 0.01, to test the null hypothesis that the variables analyzed were not correlated (Ferreira 1996;Hair et al. 1998).The values of the correlation matrix were later extracted through the PCA method.Then, the rotation of the factorial load matrix, generated in the extraction, was carried out using the varimax rotation method, which minimizes the number of variables with high loads in different factors, allowing the association of a variable with a single factor (Gomes, Anjos & Daltro 2020;Keita & Zhonghua 2017).
Finally, the samples were clustered based on the similarities of the characteristics analyzed, considering the attributes explained by each component in the factorial analysis.To do so, the HCA technique was used, which interlinks the samples according to their associations, producing a dendrogram where samples that are considered similar due to the variables chosen are clustered together (Dabgerwal & Tripathi 2016;Tiri, Lahbari & Boudoukha 2017).The HCA was conducted through the Ward method, with the similarity measure obtained by means of the squared Euclidean distance.The groups generated were submitted to an analysis of variance (One-Way ANOVA) and presented significance levels below 5%, indicating that they formed a relatively stable set of groups (Gomes & Cavalcante 2017;Gomes, Anjos & Daltro 2020).

Principal Component Analysis (PCA)
The PCAs were carried out based on simulations initially using 17 physicochemical variables.In the final simulation, 12 variables presented significant results in the definition of the most adequate model to explain the variation in data in both sampling periods (rainy and dry).
Regarding the model for rainy season sampling, the variables that were found to be significant were EC, chloride, calcium, magnesium, TH, sulfate, potassium, bromide, TDS, pH, bicarbonate, and TA.In turn, the representative variables for the dry period were EC, chloride, calcium, magnesium, TH, sulfate, bromide, TDS, pH, bicarbonate, TA, and nitrate (N-NO 3 -).The correlation between the variables selected for the model referring to the rainy period can be observed in Table 1.Nearly 77% of correlation coefficients were above 0.5, indicating a significant correlation between variables, as indicated by Gomes and Cavalcante (2017) and Gomes, Anjos and Daltro (2020).A similar behavior was observed for the dry period (Table 2), with nearly 70% of coefficients presenting values higher than 0.5.
The correlation found among pH, bicarbonate, and alkalinity was expected, given that pH is essentially a function of dissolved carbonic gas and alkalinity of the water.Moreover, alkalinity is a direct consequence of the presence or absence of carbonates, bicarbonates, and hydroxides.The strong correlation (> 0.8) among EC, chloride, calcium, magnesium, TH, bromide, and TDS Note: Highlighted in bold are the coefficients were above 0.5, indicating a significant correlation between variables.
indicated a greater influence of these ions in the variations of conductivity in the water analyzed, as well as in the concentration of TDS.The correlation with hardness can be explained by the presence of calcium and magnesium, since these ions define water hardness.
The PCAs applied showed KMO rates of 0.785 and 0.680 for the samplings conducted in the rainy and dry periods, respectively, indicating that the description of data variability was satisfactory.
The simulations conducted with the 12 variables collected during the rainy and dry periods explained 86.2% and 85.8% of total variance, respectively, and presented two components.Component 1 for the rainy and dry periods explained 70.9% and 68.5% of data variability, and were represented by the variables EC, chloride, calcium, magnesium, TH, sulphate, potassium, bromide, and TDS.These characterize components associated with water mineralization processes that result from water/rock interactions throughout percolation in different substrates.The component 2 of each period corresponded to 15.3% and 17.3% of the variance in the data sets collected in both seasons (rainy and dry, respectively) and encompass the variables pH, bicarbonate, alkalinity, and nitrate.These were defined as components associated with alterations in the acidity, alkalinity, and neutrality conditions of the waters analyzed.The presence of nitrate as a significant variable in samples from the dry period could be associated with a lower volume of rainfall that interferes in the dilution of this parameter, promoted by rainfall recharge.
Table 3 shows the factorial loads and variance explained by the components after applying the orthogonal rotation through the varimax method to minimize the number of variables that present high loads in each factor, as used in other studies (Ado et al. 2019;Gomes & Cavalcante 2017;Gomes, Anjos & Daltro 2020;Keita & Zhonghua 2017).

Hierarchical Cluster Analysis (HCA)
The HCA allowed the compartmentalization of samples into groups according to chemical similarities.The number of clusters was defined by the first big difference between coefficients that were re-dimensioned in the cluster.The cut-off point to define the homogeneous clusters was at 5, allowing a better measure of similarity for the formation of groups in both periods.In total, three clusters were formed for the first component and two clusters for the second component for both seasons.

Figure 2
Dendrograms resulting from the hierarchical cluster analysis, for the rainy period, of the variables explained by: A. Component 1; B. Component 2.  The distribution of samples in the clusters of each component was based on the variations in concentration of the elements present, mainly due to the aquifers involved, influence of seasonality, and human factors.Table 4 shows the mean concentrations of variables for the samples in each cluster.
The wells that compose cluster 1 of the first component regarding the rainy period have depths that vary between 10 and 685 meters and obtain water from the Dunas, Barreiras, and Açu aquifers, the latter of which can only be reached by wells that are at least 200 meters deep (western portion of the area).The waters from these wells are characterized by low ionic concentrations associated with the sandy constitution of the aquifers, predominantly composed of SiO 2 (Figure 4), which has low solubility in normal temperature conditions but higher solubility in geothermal groundwater conditions (Gomes, Furtado & Souza 2018).The low salt concentrations also indicate a lower marine influence in these wells.Cluster 2 encompasses wells with intermediate depths, varying between 36 and 130 meters.These waters originate from the Barreiras and Jandaíra aquifers, the latter of which does not outcrop and is detected 100 m below the surface in the western portion of the study area and from 30 m in depth onwards in the eastern coastal zone.The waters in this cluster present ions with higher mean concentrations in relation to cluster 1, thus indicating a greater contribution from seawater, as well as from the limestones in the Jandaíra aquifer due to dissolution processes involving calcite.The three wells that form cluster 3 present variables depths.The two more shallow ones are located along the coastal zone (10 and 32 meters), catching water from the Dunas and Barreiras aquifers, respectively, while the deepest one is located in the western portion of the study area (594 meters) and catches water from the Açu aquifer.Its waters have the highest mean concentrations in the component and may be related to a marine influence, which the Dunas and Barreiras unconfined aquifers are more susceptible, as well as to the mineralization of the waters resulting from the interaction with the substrate over the percolation associated with the Açu aquifer.According to Celligoi (1999), higher concentrations of sulphate, bromide, and chloride can be attributed to both marine aerosols and to the influence of seawater over continental water, due to the location of the wells near the coastline.High potassium and magnesium values may be associated with percolation through the Açu aquifer, given the presence of these elements in the rocks that compose its framework, identified by means of X-ray fluorescence analyses (Figure 4).These analyses also enabled to attribute the higher concentrations of calcium in the water to the limestones of the Jandaíra aquifer, considering the predominance of this element (nearly 93%) in the composition of the rocks analyzed, associated with high mobility (Reimann & Caritat 1998).
Anu. Inst.Geociênc., 2024;47:49797 The wells that represent cluster 1 in the second component of the rainy period are shallow to reasonably deep, with waters that originate from the Dunas, Barreiras and Jandaíra aquifers.This cluster is characterized by acidic water that may be associated with the composition of the Dunas and Barreiras aquifers, which are rich in silica and aluminium (Figure 4), and with the sampling period.According to Gomes and Cavalcante (2017), the introduction of recent waters in the aquifer (meteoric water), with higher concentrations of CO 2 , promotes an increase in groundwater acidity.Cluster 2 is composed of wells with depths that vary between 10 and 685 meters (western sector) that obtain water from all four aquifers: Dunas, Barreiras, Jandaíra, and Açu.The waters in this cluster vary from neutral to alkaline.According to Hounslow (1995), higher values of pH are usually found in waters in which there is a predominance of Na + and Ca 2+ ions or in bicarbonate-rich waters.The more alkaline property of these waters in the cluster may be an indication of a greater contribution from the Jandaíra aquifer in its origin.
Cluster 1 of the first component for the dry period is formed by the same wells as cluster 1 and another nine wells from cluster 2 of the first component for the rainy period.Thus, these are also characterized by waters with low salinity, though an increase in the values of this cluster can be observed in comparison to the samples collected during the rainy period.This demonstrates the influence of seasonality in the variation of ionic concentrations in groundwater, especially in regions where there are shallow aquifers, in which a decrease in underground recharge during periods with less rainfall contributes to an increase of salt concentrations in the waters.The 5 wells that compose this cluster 2 are also part of the second cluster of the first component for the rainy period sampling, presenting the same characteristics regarding intermediate salinity of the waters.Cluster 3 was also formed by the same wells that compose the third cluster of the first component in the rainy period sampling.This cluster is represented by waters with high salt concentrations, particularly chloride.The presence of chloride in groundwater can be attributed to the dissolution of saline deposits, saline intrusions, effluent discharges from chemical industries and domestic sewage (Celligoi 1999).The higher chloride values in the waters of this cluster are possibly associated with marine influence and influence from domestic sewage, given the absence of basic sanitation conditions in the area.
Cluster 1 of the second component in the dry period is composed, again, by the same wells of cluster 1 and another two wells from cluster 2 of the second component in the rainy period, represented by shallow wells with acidic waters.Nitrate was a significant parameter in the samples collected during the dry period, probably because of the lower dilution process that takes place in this condition in comparison to the rainy period.The absence of basic sanitation in the area, with consequent use of septic tanks, contributes to an increase in nitrate concentrations in the Legend EC -Electrical conductivity; TA -Total alkalinity; TH -Total hardness; TDS -Total dissolved solids.
Anu. Inst.Geociênc., 2024;47:49797 water (Sadler et al. 2016;Maia 2018).In the cluster 1 the nitrate presents values varying between 3.6 and 21.4 mg/L for N-NO 3 -, with six of the thirteen samples (46%) showing values above the recommendations by the Ministry of Health for drinking water in Brazil, which is a maximum value of 10 mg/L for N-NO 3 -and approximately equivalent to the orientation of the World Health Organization (WHO) that establishes limits at 50 mg/L for NO 3 -and 11.3 mg/L for N-NO 3 -.This value was defined as a protective measure against childhood methemoglobinemia (WHO 2011), though other negative health outcomes, including cancer, are associated with ingesting water with high levels of nitrate (Sadler et al. 2016;Maia 2018).Cluster 2 of the second component in the dry period is composed by wells that also part of the cluster 2 of the second component in the rainy period.This cluster encompasses wells deeper than cluster 1, reaching 685 meters, and that obtain water from the Dunas, Barreiras and Açu aquifers, which have pH varying between neutral and alkaline.Nitrate values were lower here than in cluster 1, probably due to the greater depths of the wells.This makes the water less susceptible to superficial contaminations, as also observed by Maia (2018).

Conclusions
The factorial analysis by means of PCA allowed the selection of the parameters that hold the most significance regarding the composition of groundwater in the municipality of Icapuí, Ceará.Compositional variations were represented by two components, one associated with mineralization processes, controlled by electrical conductivity, chloride, calcium, magnesium, total hardness, sulphate, potassium, bromide and total dissolved solids, and another related to seasonality and water pollution, controlled by pH, bicarbonate, total alkalinity and nitrate.
The HCA promoted the clustering of water samples with similar chemical characteristics for each factor.This enabled establishing possible correlations regarding potential origins and elements present.These, in turn, were attributed mainly to natural sources generated through geochemical processes caused by the interaction of groundwater with percolated aquifers.
The multivariate statistical analysis showed an important contribution as a tool in the development of studies that include monitoring physicochemical parameters in groundwater.This is due to the reduction of parameters for analysis without compromising the quality of the information, making the process more efficient from a logistical and economic perspective, especially in regions that have groundwater as their main source of water supply, requiring constant monitoring of compositional alterations.

Acknowledgments
The authors would like to thank the Environmental Geochemistry and Electron Microscopy Laboratories of the Geology Department of the Federal University of Ceará, for their support in carrying out the physicochemical and X-ray fluorescence analyses.

Figure 1
Figure 1 Distribution of the wells sampled and aquifers in the study area.

Legend:Figure 3
Figure 3 Dendrograms resulting from the hierarchical cluster analysis, for the dry period, of the variables explained by: A. Component 1; B. Component 2.

Figure 4
Figure 4 Main oxides present in the aquifers of the study area.

Table 1
Correlation matrix of the variables from samples collected in the rainy period (February/2019).

Table 2
Correlation matrix of the variables from samples collected in the dry period (August/2019).

Table 3
Factorial loads and variance explained in the principal component analyses, after rotation through the varimax method.

Table 4
Mean values of the parameters of samples collected during the rainy and dry periods, distributed in the hierarchical clusters.