Spatial distribution of the relative risk of Zika virus disease in Colombia during the 2015–2016 epidemic from a Bayesian approach

Abstract Objective To determine the spatial distribution of the risk of Zika virus disease in each region of Colombia during the 2015–2016 epidemic. Methods An ecological study was designed to estimate the risks for each Colombian region using first‐order neighbors, covariate effects, and three adjacent periods of time (beginning, development, and end of the epidemic) to analyze the spatial distribution of the disease based on a Bayesian hierarchical model. Results Spatial distribution of the estimated risks of Zika virus disease showed that it increased in a strip that crosses the central area of the country from west to east. Analysis of the three time periods showed greater risk of the disease in the central and southern zones—Arauca and Santander—where the increase in risk was four times higher during the peak phase compared with the initial phase of the outbreak. Conclusion In the identified high‐risk areas, integrated surveillance systems for Zika virus disease and its complications must be strengthened to provide up‐to‐date and accurate epidemiological information. This information would allow those involved in policy and decision making to identify new outbreaks and risk clusters, enabling more focused and accurate measures to target at‐risk populations.

virus infection were identified, including more than 19 000 among pregnant women. The incidence reported during the epidemic phase was of 377.7 cases per 100 000 inhabitants. 3 Several studies have been carried out to investigate the causes and behavior of Zika virus disease in Colombia. Pacheco et al. 4 estimated the risk of the disease in the country and found that most provinces had an accumulated incidence of between 0.1 and 129.7 per 100 000 habitants. Rojas et al. 5  When mapping the geographical distribution of a disease, the aim is to discover spatial patterns that help explain behavior and enable hypotheses about its etiology. The present study used Bayesian smoothing methods to estimate risk to spatially review the geographical structures of disease behavior. A wide range of models in disease mapping have been developed to offer appropriate relative risk estimates. Taking into account area information, these models provide smoothed risk maps and improve the estimates.
One of the most important studies in risk estimation is that of Besag et al. 6 Risks are estimated using a model that captures the risk structure through incorporation of information from neighboring areas. The aim is to identify any spatial relationship that reveals the behavior of the disease, how it is distributed, and whether different factors explain this behavior. Besag et al. 6 proposed a Bayesian hierarchical model that models risk, incorporating two random factors: one that explains the spatial dependence between neighboring areas and one that explains the residual effects. Some extensions and contributions to the Besag model are the works of Bernardinelli et al., 7 Best et al., 8 and Lawson. 9,10 The aim of the present study was to determine the spatial distribution of the risk of Zika virus disease in each region of Colombia during the 2015-2016 epidemic using a Bayesian hierarchical model.

| MATERIALS AND METHODS
An ecological study was designed. The units of analysis were the administrative departments of Colombia, which is politically organized into 32 departments and four districts: Bogotá (DC), Barranquilla, Santa Marta, and Cartagena.

| Case definition
Cases of Zika virus disease were those reported in SIVIGILA as con-

| Spatial distribution of disease risk
In the present study, an autoregressive approach to spatial mapping of the disease based on the model of Besag et al. 6 was applied and included estimation of risk by time period (initial, peak, and endemic).
The study by Besag, better known as the BYM convolution model, is frequently used in epidemiological literature (Mollié,11 Ferrándiz et al., 12 Barceló et al. 13 ). The main idea behind the model is to assume that there are risk factors that encompass more than one area of study and, consequently, the relative risks are spatially dependent. In the second level of hierarchy, the logarithm of relative risk is modeled according to spatially random effects, correlated and uncorrelated: where α 0 identifies the risk as one in the study area; u i is the random component with spatial dependence; and v i is uncorrelated heterogeneity. As mentioned previously, dependence is incorporated in this model based on the idea that observations from within the geographical areas will be closer to each other than observations from more distant geographical areas, thus achieving a softened estimate of the risk. Introduction of this spatial correlation structure in the model provides additional information and allows us to obtain more stable estimates of relative risks than the estimated maximum likelihood.
The relationships between risk and some bioclimatic variables have been studied; for example, altitude data taken from the Shuttle Radar Topography Mission 14 every 3-arc seconds (i.e. 90 m); average annual temperature identified with BIO01 data 15 every 10-arc minutes (i.e. 18 km); and precipitation with BIO16 data 15 every 10-arc minutes (i.e. 18 km).
In the present study the model was implemented through R statistical software version 3.5.1 16 and WinBUGS 14. 17 The QGIS 2.18.11 program 18 was used to prepare the maps.
The project was endorsed as a risk-free investigation by the Ethics Committees of North University and the Pan American Health Organization.

| RESULTS
The main input for the implementation of the Bayesian model is the number of cases of Zika virus disease in each of the areas studied during the three time periods included in the analysis. There were 40 741 cases of Zika virus disease in total: 909 in the first phase, 37 094 in the second phase, and 2738 in the third phase. The distribution of cases for the areas studied is given in Table 1.   disease, the decrease is low, indicating only a small variation in the risk between phase 2 and phase 3.
The effect of rainfall as a covariate on the risk of Zika virus disease was also analyzed. The peak period of rainfall coincided with the peak period of the disease. Results showed that areas with highest rainfall coincided with areas that had the greatest estimated risk of Zika virus disease (Fig. 4).

| DISCUSSION
To our knowledge, this is the first study to explore the risk of contracting Zika virus disease in Colombia using spatial models based on a Bayesian approach. In spatial epidemiology, it is common to use the maximum likelihood estimator of the morbidity and mortality rates to estimate risk. 5,19,20 These rates depend on the behavior of the expected values and work well when this value is high. The present study used smoothing methods that incorporate information on neighboring areas and covariates that allow more stable risk estimates to be obtained and that show a tendency on the map.
Our results show a spatial grouping of high risk for Zika virus disease in the departments located between the equator and the 3rd parallel north (3 degrees north of the equator), where there is a high level of rainfall during the year that might explain the space-time dynamics of the disease. Correlations between Zika virus cases and environmental factors such as daily rainfall, humidity, and average temperature have been described in studies from Brazil and China. 21,22 Inclusion of these covariates and geo-referencing would allow an We hope that these results are a starting point to continue studying the effect of covariates (for example Aedes aegypti pupal count, water storage containers, biological or chemical control methods, among others) on the estimation of risks in small areas. In a future study we can assess the impact of these covariates simultaneously to investigate their joint effect on the estimation of risk in each of the areas.
In conclusion, Zika virus disease is a public health emergency that, despite not having a high burden of mortality, generates neurological complications in adults and malformations in children that require prevention and control measures from healthcare authorities. Given this, risk estimation and categorization by cluster are key tools that allow the detection of high-risk areas beyond geographical boundaries.
Although the disease's epidemic status has ended in Colombia, the results of our study identify high-risk areas where integrated surveillance systems for Zika virus disease and its complications must be strengthened in order to provide up-to-date and accurate epidemiological information. This information will identify the appearance of new outbreaks and guide the response of health services.

AUTHOR CONTRIBUTIONS
KF-L, EN-L, JA-R contributed to initial study design and planning.
KF-L, HL-S, AS-C, RT-M, and MM-R contributed to data analysis; MO-M and FP-A contributed to data acquisition. All authors contributed to manuscript writing, revision, and approval of the final manuscript.