Corrosion Science 90 (2015) 33–45 Contents lists available at ScienceDirect Corrosion Science journal homepage: www.elsevier .com/ locate /corsc i Bayesian analysis of external corrosion data of non-piggable underground pipelines http://dx.doi.org/10.1016/j.corsci.2014.09.012 0010-938X/� 2014 Elsevier Ltd. All rights reserved. ⇑ Corresponding author. Tel.: +52 55 57296000x54205; fax: +52 55 57296000x55270. E-mail address:
[email protected] (F. Caleyo). F. Caleyo a,⇑, A. Valor a, L. Alfonso b, J. Vidal c, E. Perez-Baruch d, J.M. Hallen a aDepartamento de Ingeniería Metalúrgica, ESIQIE, Instituto Politécnico Nacional, Zacatenco, México D.F. 07738, Mexico bUniversidad Autónoma de la Ciudad de México, Campus S.L. Tezonco, Iztapalapa, México D.F. 09790, Mexico c Facultad de Física, Universidad de La Habana, San Lázaro y L, Vedado, La Habana 10400, Cuba dGerencia de Transporte y Distribución de Hidrocarburos, Pemex-PEP, Villahermosa, Tabasco 86038, Mexico a r t i c l e i n f o a b s t r a c t Article history: Received 12 May 2014 Accepted 6 September 2014 Available online 2 October 2014 Keywords: A. Steel B. Modelling studies C. Pitting corrosion A new Bayesian methodology for the analysis of external corrosion data of non-piggable underground pipelines has been developed. It allows for the estimation of the statistical distributions of the density and size of external corrosion defects from corrosion data samples taken at excavation sites along the inspected pipeline and can incorporate the detection and measurement errors associated with field inspections. Corrosion data obtained from field inspections of an upstream pipeline and from an in-line inspection of a transportation pipeline are used to illustrate and validate the proposed methodology. � 2014 Elsevier Ltd. All rights reserved. 1. Introduction distribution of active pits remains a very complex task; commonly Bayesian Data Analysis (BDA) has been used in the last decade with varying degrees of success in the assessment of corrosion data for upstream and transportation pipeline systems [1–25]. Previous applications of BDA include identification of risk factors in corrod- ing pipeline systems [1–3], characterization of corrosion defect depth growth [4] and estimation of corrosion rate in operating pipelines [5–12], determination of the sample size required to estimate extreme pit depth in pipelines [13,14], degradation quan- tification through External Corrosion Direct Assessment (ECDA) [15–19], calibration of in-line inspection (ILI) tools [20,21], identifi- cation of failure type in corroded pipelines [22], updating of long- term corrosion estimates of corrosion-fatigue degradation [23,24], and modelling of high pH stress corrosion cracking in underground pipelines [25]. Other structural reliability fields have also profited from the application of Bayesian corrosion data analysis [26]. The main advantage of BDA with regard to corrosion data anal- ysis is that, from a prior belief in the parameters that describe the distributions of corrosion defect size and density, and a relatively small amount of field data, reasonably accurate predictions can be made about the actual distributions of these corrosion parameters. This unique feature is of great interest in the evaluation of the damage caused by external corrosion in underground, non-piggable pipelines, for which the prediction of the size and spatial carried out using small corrosion data samples that feed statistical models such as Extreme Value Statistics [27–29]. The application of BDA in the evaluation of degradation caused by corrosion in non-piggable, underground pipelines has been commonly incorporated into ECDA frameworks [15–19]. The role of BDA in this synergy has traditionally been the estimation of the probability of detection (POD) of the inspection tools, and the estimation of the density (defects per unit length) and depth of active corrosion defects. The main drawbacks of these approaches, which continue limiting the extended application of BDA to corrosion analysis, are the relative complexity of the employed mathematical frameworks and the lack of a thorough description of the implementation details of these schemes (see, for example, [14,15]). There is also a lack of BDA tools for the analysis of field- gathered corrosion data obtained through (random) sampling of non-piggable, underground pipelines. In this paper, a new BDA methodology is proposed, illustrated and validated for the assessment of external corrosion data obtained from field sampling inspections of non-piggable, under- ground upstream pipelines. The goal of this methodology is the estimation of the statistical distributions of the density and size of external corrosion defects from a relatively small number of cor- rosion data randomly taken at excavation sites along the pipeline. The results of a previous field study of external corrosion in differ- ent upstream pipeline systems in Southern Mexico [30] are used to suggest the prior, likelihood, and predictive models of the Bayesian analysis. The Bayes rule is used to determine the posterior distribu- tions of the parameters defining the distributions of the density, http://crossmark.crossref.org/dialog/?doi=10.1016/j.corsci.2014.09.012&domain=pdf http://dx.doi.org/10.1016/j.corsci.2014.09.012 mailto:
[email protected] http://dx.doi.org/10.1016/j.corsci.2014.09.012 http://www.sciencedirect.com/science/journal/0010938X http://www.elsevier.com/locate/corsci 34 F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 depth and length of the corrosion defects in the pipeline. The pre- dictive distributions of these corrosion descriptors for the unob- served defects in the entire pipeline are obtained by averaging out the uncertainty in the estimated parameters. The proposed methodology has been validated using corrosion data obtained through ILI and also from data obtained by field inspection of cor- roding pipelines operating in the same region. 2. Theoretical foundations 2.1. Bayes’ theorem Bayes’ theorem lies in the core of any BDA1 [31]. In it, the strength of belief in parameter values h before any data is observed (prior distribution p(h)) is combined with the joint probability that the observed data (X) follows the chosen model with parameter val- ues h (sampling distribution or likelihood function L (X|h)) to pro- duce the strength of belief in parameter values h when the observed data X have been taken into account (posterior distribution Po(h|X)). It is important to underline that h and X represent a vector of parameter values {hi} and a vector of measured data points {xj}, respectively. In its continuous form, Bayes’ theorem is written as [31] PoðhjXÞ ¼ LðXjhÞpðhÞR H LðXjhÞpðhÞdh ð1Þ where H represents the space for h, while the marginal likelihoodR H LðXjhÞpðhÞdh (also known as evidence) denotes the probability that the data follow the chosen model under marginalization over all parameter values. If the evidence is thought as a normalization factor, Eq. (1) can be written as [31] PoðhjXÞ / LðXjhÞpðhÞ ð1aÞ Therefore, Po(h|X) can be computed using expression (1a) and then normalized under the requirement that it is a probability density function (pdf). This approach, used throughout this work, consider- ably reduces the computational workload associated with BDA. If the prior distribution of a given parameter hi is described by the vector of parameters a, then it is said that {ak} are the hyper- parameters of hi.2 For the general case where hyperparameters are considered, expression (1a) is written as PoðhjX;aÞ / LðXjhÞpðhjaÞ ð1bÞ 2.2. Bayesian prediction Once the posterior distribution of h is estimated using expres- sion (1b), it is possible to make a prediction of the probability of new unobserved data values, conditional on the observed data X and hyperparameters a. If the data is assumed to have a distribu- tionM (note that M is also used to construct the sampling distribu- tion or likelihood function), then it is possible to predict the predictive distribution Ppðx̂jX;aÞ of the unobserved data points by averaging out the uncertainty in h. This is achieved by margin- alizing Po(h|X, a) over h [31]: Ppðx̂jX;aÞ ¼ Z H MXðxjhÞPoðhjX;aÞdh ð2Þ 1 Bayesian Data Analysis is used in the text to encompass other commons terms such as Bayesian inference, Bayesian updating, Bayesian probability and Bayesian statistics [31]. 2 For example, if the failure rate (k) of an exponentially distributed variable has a Gamma prior distribution with scale r and shape f parameters, then the hyperpa- rameters of h = {k} are a = {r, f}. 3. BDA framework 3.1. Variables of interest In order to apply BDA to corrosion data, the generic formulation used in the preceding section must be translated into a practical, corrosion-specific formulation involving the corrosion variables, distributions, and parameters to be investigated. The variables of interest in this study are the depth (d), length (‘), and density (n) of the corrosion defects; the latter in defects per excavation site. The models for the data (M), sampling (L), and prior (p) distributions of these variables were proposed from their empirical distributions, which were obtained in an extensive field survey conducted in Southern Mexico over a 7-yr period from 2005 to 2012 [30]. During the field work, corrosion data were gath- ered at randomly selected ditch sites in five gathering/upstream pipeline systems, totalling 964 km and 620 pipelines. In each one of the 16,636 excavated sites, the depth, length, and number (per site) of the observed external corrosion-caused metal losses were recorded; these variables were obtained for a total of 13,286 exter- nal corrosion defects. The field reports also included the trench length and the age, coating type and condition, diameter, wall thickness (pwt), steel grade, and operating pressure of the inspected pipeline [30]. The empirical distributions of the depth and length of the observed external corrosion defects were found to be better described by the Generalized Extreme Value (GEV) distribution, whose pdf is given by the expression [32]: f GEVðxÞ ¼ 1 r exp � 1þ f x�l r � �� ��1=fn o 1þ f x�lr� �� ��1�1=f; f–0 1 r exp � x�l r � � � exp � x�lr � �� �� � ; f ¼ 0 8< : ð3Þ where f, r, and l are the shape, scale, and location parameters, respectively. The location (l), scale (r), and shape (f) of the GEV distributions fitted to the measured vectors of data points for the depth, D = {di}, and length, K ¼ f‘ig, of the observed defects are given in Table 1. The number n of defects per (2.44 m-long) excavation site was fitted to a Negative Binomial (NegBin) distribution with parame- ters p and g. The probability mass function (pmf) of n, a nonnega- tive integer, is [32] f NBðnÞ ¼ gþ n� 1 g� 1 � pgð1� pÞn ð3aÞ It is worth noting that, in this study, g is a positive real-valued num- ber. This kind of generalization is known as Gamma–Poisson mix- ture or Pólya process [33]. The reasons behind the choice of this form of the NegBin are given and justified in a separate paper under preparation. The parameters of the NegBin distribution fitted to the measured vector of data points of defect density, N = {ni}, are also given in Table 1. 3.2. Posterior distributions Although a certain degree of physical dependence is to be expected to occur between the depth, length, and density of corro- sion defects [34], the mathematical burden associated with consid- ering such dependence in a BDA could render it intractable. A key point to make the present BDA as conceptually simple and easy to implement as possible is that these variables can be treated as sta- tistically independent. This approach is not new and has been used by other authors [35,36]. The independence assumption can also be made for the parameters of the corrosion data distributions without incurring in significant errors. Under such assumption, Table 1 Parameters of the statistical distributions fitted to the empirical corrosion data obtained in the field survey described in Ref. [30]. Variable Units Distribution Parameters la ra f g (per site) p Depth (d) %pwtb GEVc 18.5 8.86 0.078 Length (‘) mm GEVc 95.0 195 0.753 Density (n) per site NegBind 0.208 0.210 a Given in the same units of the described variable. b Percent of the pipe wall thickness. c Generalized Extreme Value distribution. d Negative Binomial distribution. F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 35 the Bayesian analysis can be carried out separately for each one of the variables considered in Table 1; therefore, three different ver- sions of expression (1b) are required: PoðhdjD;adÞ / LðDjld;rd; fdÞpðldjadlÞpðrdjadr ÞpðfdjadfÞ ð4Þ Poðh‘jK;a‘Þ / LðKjl‘;r‘; f‘Þpðl‘ja‘l Þpðr‘ja‘r Þpðf‘ja‘f Þ ð5Þ PoðhnjN;anÞ / LðNjgn;pnÞpðgnjang Þpðpnjanp Þ ð6Þ In these expressions, subscripts indicate the variable of interest, while sub-subscripts indicate the parameter whose prior distribu- tion is specified by the vector of hyperparameters a.3 Note that, although not necessary, this notation scheme has also been used for n for the sake of notation homogeneity. 3.3. Joint sampling and prior distributions From the information given in Table 1, and from the assumption of variable and parameter independence, it was also possible to propose the joint sampling distribution of each variable. If the number of excavated ditches in the inspected pipeline is nD, the number of defects found at the kth ditch is nK, and the observed defects in these ditches sump up to nT, these functions can be writ- ten as [31] LðDjhdÞ ¼ YnT i¼1 f Dðdijld;rd; fdÞ ð7Þ LðKjh‘Þ ¼ YnT i¼1 fKð‘ijl‘;r‘; f‘Þ ð8Þ LðNjhnÞ ¼ YnD k¼1 f Nðnkjgn; pnÞ ð9Þ with nT ¼ XnD k¼1 nk ð9aÞ where fD(�), fK(�), and fN(�) represent the pdf-s of the depth (GEV), length (GEV), and density (NegBin) of the corrosion defects, respec- tively; and di, ‘i, and ni, (i = 1. . .nT) are the measured values of the depth, length, and density of the corrosion defects, respectively. On the other hand, the prior distributions to be used in expressions (4)–(6) can be specified from experience accumulated in previous studies of external corrosion in underground pipelines [37–40]. Uniform and Normal priors are considered, with parame- ters determined from the expected interval [min(hi), max(hi)] of each parameter, as shown in Fig. 1(a). As has been mentioned, the mean ( ) and variance ( ) of these priors must be treated as hyperparameters of the parameter hi of the distribution of the variable of interest x, that is 3 For example, adf refers to the vector of hyperparameters specifying the prior distribution for the shape (f) of the depth (d) distribution. . The mean and variance of both types of priors can be determined in a quite straightforward manner as shown in Fig. 1(a). In the case of the Normal prior, its variance can be com- puted by equatingmin(hi) andmax(hi), respectively, to the 2.5% and 97.5% quantiles of a Normal distribution centred at the midpoint of the interval, or from a reasonable coefficient of variation (COV) value; for example 0.5. The interval for each one of the hyperparameters of a given hi; was estimated from previously published data. For example, in Refs. [37,39] experimental information is provided about the time evolution of the mean values of the location, scale, and shape parameters of the GEV distributions estimated for the depth of external corrosion defects in underground pipelines. Fig. 1(b) shows this evolution for the mean of the location parameters for a generic soil class (All), together with its expected interval; this latter determined as the difference between the value of this parameter for the most (clay) and less (sandy-clay-loam) corrosive soil classes [37]. In the case where little or no information is available about one or some of the parameters in expressions (7)–(9), the expected interval of the parameter(s) can be proposed based on physical or mathematical reasoning. For example, the shape parameter of a GEV distribution can be initially assumed to be within the (�0.5, 0.5) range to contemplate the possibility of having a right- and left-skewed distributions for the variable of interest, respec- tively [32]. In such a case, an adaptive calculation method should be used to produce the right-hand side of expression (1b). Such a method should be capable, if necessary, to explore for solution both within and outside the interval originally proposed. It should be capable of dynamically changing the resolution used to explore the parameter space. An iterative version of such an adaptive com- putation approach is used in this study (see Section 4). 3.4. Predictive distributions The probability distributions of the unobserved depth, length, and density of the corrosion defects is determined by averaging (marginalizing) the corresponding predicted posteriors across all parameter values in the interval HX where these distributions exist for the variable of interest [31]: Ppðd̂jD;adÞ ¼ Z HD f DðdjhdÞPoðhdjD;adÞdhd; hd ¼ fld;rd; fdg ð10Þ Ppð‘̂jK;a‘Þ ¼ Z HK fKð‘jh‘ÞPoðh‘jK;a‘Þdh‘; h‘ ¼ fl‘;r‘; f‘g ð11Þ Ppðn̂jN;anÞ ¼ Z HN f NðnjhnÞPoðhnjN;anÞdhn; hn ¼ fp;gg ð12Þ The distributions obtained by means of these expressions constitute the final and most important outcome of the outlined Bayesian methodology for the analysis of external corrosion data in underground non-piggable pipelines. As mentioned earlier, they Fig. 1. (a) Uniform (red) and Normal (blue) priors for parameter hi; the hyperparameters ( , ) are estimated from the expected interval of hi. (b) Time evolution of the mean value of the location parameter of the GEV distribution for corrosion pit depth in underground pipelines [37–39]. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) 36 F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 allow analysts to perform reliability and risk analyses in these pipe- lines from a relatively small amount of field-gathered corrosion data. 3.5. Probability of detection and measurement errors Whichever the field-inspection method is—from laser scanners to pit gauge [41]—it is important to consider the uncertainty in defect detection and sizing during the inspection. Commonly, these two aspects are accounted for via the probability of detection (POD) and the measurement errors (MEs) of the inspection tool, respectively [26,41,42]. The impact of the POD and MEs on the gathered corrosion data can be incorporated in the proposed BDA framework by considering their effect on the sampling distribu- tions. The approach used by Yuan et al. [26] is used in this study for incorporating the POD and MEs in Eqs. (6)–(8). As in many other studies [15–19,25], these two uncertainties will only be con- sidered here for the depth and density of the corrosion defects. Measurement errors are, by nature, random, additive, and inde- pendent and identically distributed [41,42]. In the absence of bias (accuracy errors), and assuming that sizing uncertainty is indepen- dent of the magnitude of what it is measured,4 MEs can be described using a Normal distribution with mean zero and variance ( ). Empirical values of for typical field inspection tools can be found elsewhere [41]. Symbolically, the pdf of this distribution is written as fME( ) = Normal(0, ). Based on these facts, the ME-affected version of the model distribution for defect depth to be used, after proper normalization, in the likelihood distribution, Eq. (7), can be written as [26] f MED ðdjhdÞ ¼ Z 1 0 f DðdjhdÞf ME d� d; � dd ð13Þ On the other hand, the POD is commonly expressed as a function of the corrosion defect depth, POD(d) [26]; therefore, the POD-affected version of the model distribution for defect depth to be used, after proper normalization, in the likelihood distribution for defect depth becomes f PODD ðdjhdÞ ¼ PODðdÞf DðdjhdÞ hPODjhdi ð14Þ In this expression, the denominator represents the unconditional detection probability or average POD, which is 4 More accurately expressed, it is assumed that, once a corrosion defect has been detected, its depth and length are measured with an error that is independent of their values. hPODjhdi ¼ Z 1 0 PODðdÞf DðdjhdÞdd ð14aÞ The combined effect of the POD and MEs can be considered by merging Eqs. (13) and (14) into [26] f POD;MED ðdjhdÞ ¼ R1 0 f DðdjhdÞPODðdÞf MEðd� d; Þdd hPODjhdi ð15Þ On the other hand, the POD-affected version of the model distribu- tion for the number of defects per ditch to be used, after proper nor- malization, in the sampling distribution for the detected defect density, Eq. (9), can be approximated by f PODN ðnjhnÞ � X1 v¼n f NðnjhnÞ m n � hPODjĥdinð1� hPODjĥdiÞ m�n ð16Þ where the average POD hPODjĥdi is determined just once using Eq. (14a) with f DðdjhdÞ substituted by the predictive function for the defect depth Ppðd̂jD;adÞ. This approximation of f PODN ðnjhnÞ was adopted in order to avoid the complexities associated with considering the dependence of the POD on the defect depth each time that a set of parameters hd is considered; for example, in Eq. (9). Instead, this dependence is considered only once in Eq. (15) for the distribution of corrosion depths predicted for the inspected pipeline. This approximation also implies that the predictive function has already been esti- mated for d, which is not a practical limiting drawback. 4. Implementation Given the assumptions made earlier about the independence of the variables of interest, the implementation of the proposed BDA was based on the solution of three separate, relatively simple prob- lems: one for depth (3 parameters, 6 hyperparameters), another for length (idem), and the last for density (2 parameters, 4 hyperpa- rameters). This makes possible the use of the so-called GRID method for numerically approximating the prior, posterior, and predictive distributions [31]. 4.1. Grids for the depth, length, and density of the corrosion defects In the GRID method, the prior distributions are defined in an nh-dimensional (nh-D) fine grid of their hi values; being nh the num- ber of parameters, so that i = 1. . .nh. Consequently, for each prior, an nh-D grid of finite discrete values of the nh parameters is created within the intervals where there is a non-zero belief in hi. Accord- ing to Section 3.3, the non-zero belief intervals are determined by the hyperparameters of each hi (as shown, for example, in Fig. 1(a)). Fig. 2. Grid approximation for the prior of the parameters for the depth (length) variable. At each grid point, the joint probability mass function (pmf) is computed from the corresponding marginal masses. F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 37 Over the grid, each prior distribution denotes a marginal prob- ability mass function at the corresponding axis of the nh-D coordi- nate system; with hi values defined by the step Dhi used in the grid. Therefore, each point of the grid has associated a value of the resulting joint pmf (for example, p(ld)p(rd)p(fd) for depth) and discrete versions of Eqs. (4)–(12) can be used to carry out the BDA described by them. Fig. 2 exemplifies the discretization pro- cess for the prior distributions of the depth variable, nh = 3. In it, a coarse grid is used for the sake of clarity, and the discrete mar- ginal and joint (pmf) priors are shown. This grid is also suitable for the length variable, after the appropriate parameters are cho- sen; nh being equal to three in this latter case as well. In the case of the density, nh = 2, so that a 2-D grid is required to approximate its prior distribution.5 4.2. Sampling, posterior, and predictive distributions over the grid Once the grid is defined and the priors are constructed over it, the sampling, posterior, and predictive distributions of the param- eters and variables of interest can be also computed at each point of the grid using the discrete version of Eqs. (4)–(12). For example, following Eq. (12), the probability of a new unobserved corrosion defect density value n̂, given the measured density values N, is computed over the grid HN as the probability of that value happen- ing for each discrete value hn = {gn, pn}, weighted by the discrete posterior believability of each hn: P�Pðn̂jNÞ ¼ Z HN f NðnjhnÞPoðhnjNÞdhn � X HN f �NðnjhnÞP�oðhnjNÞDhn ð17Þ where P�P ; f � N , and P � o are the pmf counterparts of the corresponding pdf-s. Expressions similar to (17) are used to obtain the predictive dis- tributions of the depth and length of the corrosion defects over their respective grids. After each predictive is computed following this procedure, it should be normalized under the requirement that it is a pmf:X All x̂s P�Pðx̂jXÞDx̂ ¼ 1 ð18Þ 5 Although not explicitly shown in Fig. 2, parameters ld, rd, and fd are defined in the finite, discrete intervals were they have nonzero belief; however, the notation used so far for them is kept hereafter for the sake of simplicity. This also applies for the rest of parameters used in this work. where Dx̂ is the step used to obtain the discrete predictive of vari- able x. 4.3. Iterative adaptive computational approach The computation of the prior, sampling, and posterior distribu- tions over the grid is carried out using an iterative adaptive method. It allows searching for the solution both within and (when required) outside the proposed parameter interval, while dynami- cally changing the resolution used to scan the parameter(s) space. This method is illustrated in Fig. 3 using an arbitrarily chosen 1- D parameter space. Initially, the proposed interval for the parame- ter(s) of interest is coarsely discretized and the prior, sampling, and posterior distributions are computed over this coarse grid. The non-parametric mean (1�h, the pre-symbol subscript indicate the calculation step) and standard deviation (sd(1h)) of the estimated posterior (1Po(h)) of the parameter(s) are used to define a subse- quent interval, which is centred at 1�h and extends for several (user-defined) sd(1h)-s around it. In Fig. 3, just for the sake of illus- tration, the interval discretization step (dh) was defined using the estimated sd(h). The new grid to be used in each subsequent calcu- lation step was selected using a ± 3-sd(h) interval around �h. The selection and discretization process of each new parameter inter- val from one calculation step to the next is represented in Fig. 3 using grey-shaded quadrilaterals. In Fig. 3(a) this process is shown for the straightforward case in which the (unknown) solution lies within the interval where the parameter is initially thought with non-zero belief. The resolution of each new grid (�1/dh) increases from one calculation step to the next. Therefore, the calculations are made over finer and finer grids; this helps increase the accuracy of the estimated posteriors. Also, in order to make these computations adaptive, the selec- tion of the future (ith + 1) scan interval is not restricted to be within the bounds used in the current (ith) computation step. In this way, if the solution for Po(h) lies outside the proposed param- eter interval, as shown in Fig. 3(b), the method is able to search and find the solution outside this interval. This is particularly useful in those situations where the prior distributions are not accurate enough due to a bad model choice or/and a relatively high igno- rance about the parameters. The iterative scan process, illustrated in Fig. 3 (for only 3 steps) can be carried out until the results do not change considerably from one calculation step to the next. Besides this necessary, but not sufficient, relative criterion, the visual inspection of the esti- mated posterior distributions is highly recommended in order to check for inconsistencies in the solution such as biased/truncated, out of sound range, and too-narrow estimated posteriors. The experience gained in this work suggests that the use of 20–25 interval discretization steps, ±4-sd(h)-wide future intervals, and 10–25 iteration steps ensures accurate estimates of the corrosion data distributions of external corrosion defects in underground upstream pipelines. The software Mathematica 9.0 [43] was used to implement the iterative adaptive method just described and also to perform the computation of the predictive distributions, and the assessment of the goodness of fit of the estimated predictive distributions with respect to the empirical data. 5. Illustration and validation The ILI-collected (external) corrosion data of a transportation pipeline (PA) and the field-collected data of a gathering pipeline (PB) were used to validate the proposed BDA framework. Table 2 shows the information relevant to this study for both pipelines. The in-line inspection was performed using a magnetic flux leak- Fig. 3. Iterative adaptive method used to compute the posterior distribution over a 1-D parameter space when the solution (white start) lies (a) within the proposed parameter interval and (b) outside this interval. The pre-symbol subscripts indicate the calculation step. 6 As defined in Mathematica 9.0, using the KolmogorovSmirnovTest function with Method? ‘‘MonteCarlo’’ [43]. 38 F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 age (MFL) tool. Meanwhile, defect depth field data were gathered by means of a pit depth dial gauge with a bridging bar, and defect lengths were determined taking the difference between the abso- lute positions along the pipeline of the start and ending points of the defect. The measurement errors were typical of the tools employed in each case [44]. In what follows, the observed corro- sion data for pipelines PA and PB are referred as empirical ILI and field data, respectively. Likewise, the defect depths are expressed in percent of the pipe wall thickness (%pwt) and the defect lengths are given in mm everywhere in the text. Three aspects were evaluated during the validation process: (i) the ability to correctly reproduce the empirical ILI (in PA) and field (in PB) defect depth, length, and density distributions from a single field sampling inspection dataset; (ii) the statistical behaviour of the estimations for a large number of inspection data sets; and (iii) the number of inspection sites (per km of pipeline) and observed defects required to obtain accurate estimations. In order to carry out the first of these evaluations, the external corrosion defects located and sized by ILI in PA and by field-inspec- tion in PB were randomly sampled based on their location in their respective pipeline. This helped simulate a typical field sampling corrosion data collection process. To do this, 15 (2.44-m long) ditches were randomly selected along each pipeline. The number and size (depth and length) of the external corrosion defects observed in each simulated ditch were considered as the field observations to be used as input to the proposed BDA framework. For the sake of simplicity, the POD function (see Eq. (14)) is taken throughout this work as a constant equal to one. On the other hand, in the proposed exercises it is considered that the defect sizing is performed with an ideal instrument with negligible measurement errors. Thus, the empirical external corrosion data in pipelines PA and PB are considered to be the true distributions of defect depth, length, and counts, which will be estimated from lim- ited samples using the BDA scheme. The predictive distributions obtained for the depth, length and density of defects in each pipeline were compared with the corre- sponding empirical distributions in the entire pipelines in order to assess the quality of the estimations using Monte Carlo6 Kolmogo- rov–Smirnov (K–S) and Pearson v2 (P–v2) tests for the predictions related to d, ‘, and n, respectively. This exercise serves for both the validation of the proposed BDA framework and the illustration of its application. The second of the validation steps was carried out by repeating 1000 times the sampling-prediction process described in the pre- ceding paragraph. This allowed obtaining the predictive distribu- tions and also the distribution of the P-values of the 1000 K–S and P–v2 goodness of fit tests. The confidence associated with the BDA estimations was computed as the percentage of simula- tions for which no reasons were found to reject the null hypothesis that the obtained predictive distributions correctly describe the empirical data in the entire pipeline. The non-rejection criterion in each simulation was that the P-value from the Monte Carlo K– S (for d and ‘) or P–v2 (for n) tests were equal to or greater than 0.05. Finally, the paramount question about the amount of data required to obtain good estimates of the corrosion data in the entire pipeline from a field sample through the application of the proposed BDA framework was addressed. To do this, the preceding exercise was carried out in both pipelines for different numbers of excavation sites. Varying the number of inspection ditches pro- duced a different amount of observed corrosion defects according to the empirical defect density in each pipeline (Table 2). For each pair of numbers of excavation sites and observed defects, the errors associated with the predictions made for the parameters of the depth, length and density distributions were quantified through the mean squared error (MSE) of the estimated parameters ĥ with respect to those of the empirical distributions (h): DðĥxÞ ¼ X i hĥxi i � hhxi i �2 þ 1 nS XnS s¼1 ĥxi � hhxi i �2( ) ð19Þ Table 2 Basic information of the studied pipelines together with the statistical description of the empirical external corrosion data collected in each one of them. Pipeline (inspection) Description Observed variables Non-parametric moments Fitted distributionf a PA (ILI, 2011) pwt:b 11.9 mm d (%pwtg) 17.6 9.06 2.04 GEV(13.2, 4.10, 0.338)h OD:c 914 mm Coating: Coal tar ‘: (mm) 147 213 2.24 GEV(37.5, 34.6, 1.08)h CP:d Imp. current Age: 25 years n (1/site) 0.198 0.662 5.40 NegBin(0.148,0.427)i Length: 19 km Defects:e 2327 PB (Field, 2009) pwt:b 7.92 mm d (%pwtg) 17.0 4.85 1.19 GEV(14.9, 3.40, 0.096)h OD:c 152 mm Coating: Coal tar ‘: (mm) 149 206 2.22 GEV(291, 383, 1.100)h CP:d Sacrificial anode Age: 27 years n (1/site) 1.40 2.28 2.89 NegBin(0.512, 0.268)i Length: 2.66 km Defects:e 518 a Skewness. b Pipe wall thickness. c Nominal pipe (outside) diameter. d Cathodic protection type. e Total number of defects in the inspected pipeline. f In the form GEV(l, r, f) or NegBin(g, p). g Percent of the pipe wall thickness. h Generalized Extreme Value. i Negative Binomial. F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 39 where the angular brackets refer to the average values of the corre- sponding parameters, x refers to either the depth (d), length (‘), or density (n) of the corrosion defects, nS denotes the number of pre- dictions (simulations), and i indexes the parameters for each vari- able x. 5.1. Estimations from a single field sampling data set Although not statistically significant, the use of a single corro- sion dataset helps illustrate the application of the proposed BDA framework and also serves as the starting point for the validation process described earlier. The grey-shaded histograms in Fig. 4 describe the simulated empirical data for the depth (Fig. 4(a)), length (Fig. 4(b)), and density (Fig. 4(c)) of 55 corrosion defects found in 285 simulated excavation sites in pipeline PA. Shown in these figures are also the prior beliefs for the corresponding distri- butions (Table 1). As has been mentioned, these data were obtained through random sampling of the empirical data in a large number of pipelines operating in the studied region. For this exer- cise, the number of excavation sites was selected large enough (15 per km) to avoid statistical errors due to poor sampling. The suit- ability of this number is justified later, during the third stage of the validation process. The simulated data were used as input (D = {di}, K = {‘i}, and N = {ni}) to the proposed BDA framework in order to construct the sampling distributions of d, ‘, and n, respectively, based on Eqs. (7)–(9a) with a GEV distribution for d and ‘ (fD(�), fK(�)), and a NegBin distribution for n (fN(�)). The prior distributions (p(hi)) of their parameters were defined using Normal distributions with means equal to the values shown in Table 1 and standard devia- tions determined by the assumption that the COV of the parame- ters was 50% in all cases. For example, this choice led to hyperparameters fadl ð%pwtÞ; adr ð%pwtÞ; adfg = {(18.5, 9.25), (8.86, 4.43), (0.078, 0.039)} in the case the depth variable. Also, each prior distribution was constrained to its physically meaning- ful bounds; for example, p(ld) = 0 if ld < 0 or ld > 100%pwt. For each variable, its discrete prior distributions were con- structed using a 25-step grid. The iterative adaptive process illus- trated in Fig. 3 was applied for the determination of the posterior distributions of each parameter with 25-step, ±3-sd(h)-wide future (next-iteration-step) grids. As an example, Fig. 5 shows the poste- riors obtained for the location, scale and shape parameters of the GEV model for defect length in PA, from the simulated empirical input dataset. Similarly, Fig. 6 shows the posteriors obtained for the size and probability parameters of the NegBin model for defect density in PA, from the simulated empirical input dataset. It is important to underline that the posteriors shown in Figs. 5 and 6 differ from their respective priors. The Gaussian shape of the prior distributions is fairly preserved but the COV (50% originally) was drastically reduced from the prior to the posterior beliefs. In the case of parameter p, the solution was found outside the initially proposed non-zero-belief interval. It is also worth noting the increase in resolution from the initial to the final grids. For exam- ple, the location for defect length was initially defined over a 10 mm/step grid (�250-mm wide, 25-step discrete interval); con- versely, the solution was found over a finer grid with a 1.4 mm/ step resolution (Fig. 5(a)). The estimated posteriors were used later to determine the pre- dictive distributions of the depth, length, and density of the corro- sion defects in the studied pipelines. The results of these estimations for pipeline PA are included in Fig. 4. In the case of pipeline PB, the obtained predictives are shown in Fig. 7 together with the prior beliefs, the simulated empirical data, and the GEV (d and ‘) and NegBin (n) models fitted to the empirical data for the entire pipeline. Two aspects are of interest when the results shown in Figs. 4 and 7 are analyzed. The first is the ability of the proposed BDA framework to correctly find the predictive distributions that fit the sampling (simulated empirical) data, even when the previous belief about the distributions of the involved parameters is far apart from the actual distributions. Table 3 shows the results of the Monte Carlo K–S (for d and ‘) and P–v2 (for n) goodness of fit tests conducted by taking the predictive distributions shown in Figs. 4 and 7 as the distributions from which the respective simu- lated empirical and (all defects) empirical datasets in PA and PB are drawn (see the Footnote 6). As it can be seen, in the case of simu- lated empirical datasets vs. predictives, the K–S- and P–v2-derived P-values are higher than 0.15 in all cases. This quantitatively con- Fig. 4. Prior belief, simulated empirical, ILI-derived, and predictive distributions for the (a) depth, (b) length, and (c) density of external corrosion defects in pipeline PA. Fig. 5. Posterior distributions for the (a) location, (b) scale, and (c) shape parameters of the GEV model for defect length in pipeline PA. 40 F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 firms that the predictives correctly describe the simulated empiri- cal data. Although obvious in its nature, this confirmation points out to the appropriateness of the structure and assumptions of the proposed BDA framework. Though not sufficient, it is a neces- sary condition to be fulfilled by this framework to get completely validated. The second and most important point to underline is that the framework is capable to fairly reproduce the empirical defect depth, length, and density distributions from a single field sam- pling inspection dataset. As it is shown in Table 3 for the case of all-empirical datasets vs. predictives, this second point is sup- ported by values of the K–S- and P–v2-derived P-values higher than 0.05 in all cases. This is critical to the practical applicability of the methodology since, in real-life situations, the one-dataset scenario is usually the only possible situation. Furthermore, it is statistically expected, although not guaranteed, that the quality of these results should be reproduced from one dataset to another. Two key requirements are to be met to increase the level of con- fidence in obtaining good estimates through the application of the proposed BDA framework. First, the numbers of excavation sites and observed defects, respectively, have to be large enough in order to avoid sampling errors due to insufficient sample size. Sec- ond, the corrosion conditions along the pipeline must be relatively Fig. 6. Prior and posterior distributions for parameters (a) g and (b) p of the NegBin model for defect density in pipeline PA. Fig. 7. Prior belief, simulated empirical, ILI-derived, and predictive distributions for the (a) depth, (b) length, and (c) density of defects in pipeline PB. F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 41 homogeneous in order to avoid sampling errors due to heterogene- ity. The first of these requirements is further investigated in the next section, while the second, although obvious, must be fulfilled by the operator or analysts from all the accumulated knowledge about the pipeline under analysis. For example, if there are varia- tions in soil corrosivity, cathodic protection conditions, and/or coating conditions along the pipeline, them its segmentation in homogeneous sections is mandatory for the proper application of the proposed Bayesian corrosion data analysis. 5.2. Statistical performance of the estimations The process just described was carried out 1000 times for each variable and the confidence associated with the BDA estimations in each pipeline was determined as described earlier. Table 4 shows the results of these simulations. Together with the confidence of associated with the BDA estimations; this table shows the relative error of the mean of the resulting distribution of each parameter with respect to the values shown in the last column of Table 2. The COV of each one of the obtained distributions is also presented in Table 4. From the practical point of view, the results in Table 4 are sat- isfactory given that the confidence associated with the BDA esti- mations are, in all cases, equal to or greater than 80%, while the relative error of the mean value and the COV of the estimations are relatively low. The worst case is associated, in both pipelines, with the estimations of the distribution of the defect lengths. Meanwhile, among the parameters of the GEV distribution, the shape is the one that shows in most of the cases the largest estimation errors. This can be related to the fact that, while depth evolution is almost completely determined by the corrosion mech- anism, the observed lengths (in many cases exceeding 1 m, see Fig. 7(b)) may result from other processes such as pit coalescence and extended damages produced to the pipeline coating. As it will be shown later, the quality of the estimations can be further increased if the numbers of excavation sites and of observed corrosion defects increase with respect to the values shown in the second column of Table 4. On the other hand, decreasing the amount of the observed corrosion data below these values will produce estimations with a poor confidence and large statistical, mainly sampling, errors, which render the estimations useless in practical terms. Table 3 Goodness of fit (P-values) of the simulated and all-empirical data to the predictive distributionsa. Pipeline Simulated empirical data vs. predictives All-empirical data vs. predictives d: K–S ‘: K–S n: P–v2 d: K–S ‘: K–S n: P–v2 PA 0.17 0.44 0.75 0.11 0.44 0.29 PB 0.54 0.71 0.92 0.14 0.19 0.06 a K–S stands for the Kolmogorov–Smirnov test, while P–v2 stands for the Pearson’s chi-squared test. Table 4 Confidence (co, in %) of the BDA estimations and relative error of the mean (re, in %), and coefficient of variation (cv, in %) of the distributions of the estimated parameters for each variable. Pipeline Ditches (defects)a d ‘ n co ld re(cv) rd re(cv) fd re(cv) co l‘ re(cv) r‘ re(cv) f‘ re(cv) co g re(cv) p re(cv) PA 15(56) 88 2(6) 2(20) 5(22) 87 14(18) 20(28) 2(14) 89 4(15) 3(12) PB 15(71) 80 1(5) 3(17) 15(85) 90 10(18) 11(13) 18(30) 90 8(12) 10(15) a Ditches per km and average total number of defects observed in these ditches. 42 F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 5.3. Number of inspection sites and number of observed defects for accurate estimations In order to find the amount of data required to obtain good esti- mates with a relatively high confidence, the previously described exercise was carried out for different numbers of excavation sites per km, ranging from 1 to 50. For each pair of number of excavation sites and average number of corrosion defects, the value of the MSE was computed for the parameters hx of the depth, length, and den- sity of the corrosion defects using Eq. (19) for nS = 1000 simula- tions. At the same time, the confidence in the estimations was computed as in the preceding subsection. Fig. 8 shows the evolution of the resulting (relative) MSE and confidence with the numbers of excavation and corrosion defects for pipelines PA (Fig. 8(a)) and PB (Fig. 8(b)), respectively. Because the terms that sum up in Eq. (19) have different units, given the different units of variable x (depth, length or density) and index i (from one parameter to the next), the MSE values in this figure were normalized with respect to the value obtained for 50 excava- tion sites. Similarly, the optimum number of excavation sites and the total number of defects for obtaining as good estimates as possible is defined in this exercise as the respective threshold values from which there is no point in increasing these numbers to improve Fig. 8. Influence of the number of excavation sites and observed corrosion defects upon t pipeline PA and (b) pipeline PB. the quality of the estimations. Therefore, the beginning of a plateau in the plots of the computed relative average MSE against the num- ber of ditches and defects (thick-line curves in Fig. 8) indicates the minimum values of these numbers needed to obtain as good esti- mates as possible of the involved parameters. At the same time, the numbers of ditches and defects required for reaching a confidence (thin-line curves in Fig. 8) equal to or greater than 80% can also be used as threshold values to define the minimum amount of field data needed to attain statistically dependable estimations. The results shown in Fig. 8 indicate that the optimum number of excavation sites per km to obtain as good estimates as possible is 15. The optimum total number of corrosion defects detected and sized during the field inspection to achieve the same goal is 60. These thresholds must be both reached or (if possible) exceeded during field works aimed at evaluating external corrosion in non- piggable underground pipelines. As Fig. 8 also shows, the total number of observed corrosion defects will change from one pipe- line to another as the density of corrosion defects strongly depends on the corrosivity conditions and degree of protection of each pipe- line. Therefore, notably for pipelines with low defect density, the number of excavations sites must be equal to or greater than 15 and must, at the same time, also ensure that the total number of observed external corrosion defects in the inspected pipeline is equal to or exceeds 60. he relative MSE (thick lines) and confidence (thin lines) of the BDA estimations in (a) F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 43 6. Conclusions A new Bayesian methodology has been proposed, illustrated, and validated for the analysis of external corrosion data of non-pig- gable underground pipelines. It helps estimate, with relatively high confidence and reduced estimation errors, the statistical distribu- tion of the density and size of external corrosion defects from a rel- atively small number of corrosion data samples taken at excavation sites along these pipelines. Beyond the fact that the proposed methodology is conceptually simple and easy to imple- ment, it has associated three important practical advantages wor- thy of consideration. Firstly, it automatically adjusts to the amount and nature of the available data; secondly, it can incorporate the uncertainty in detection and measurement errors associated with field measurements; and lastly, it is ready to be incorporated into ECDA methodologies. The application of the developed Bayesian methodology to cor- rosion data from field and in-line inspections of upstream trans- portation pipelines helped answering the important question about the amount of data required to obtain good estimates of the corrosion data in the entire pipeline when performing field sampling inspections. The results of the conducted Monte Carlo simulations indicate that the number of excavation sites per km to obtain as good estimates as possible must be equal to or greater than 15 and that the total number of corrosion defects detected and sized during the field inspection to achieve the same goal must be equal to or greater than 60. These figures will ensure a confi- dence equal to or greater than 80% for the estimated distributions of the size and density of external corrosion defects in non-pigga- ble underground pipelines. Acknowledgments Part of this study was done during a stay of J. Vidal at the National Polytechnic Institute (ESIQIE-IPN) of Mexico under the research Project CIDIM 425101840. The authors are grateful to Pet- róleos Mexicanos (Pemex) for permission to publish these results. The comments provided by the reviewers are deeply appreciated. Appendix A Glossary of acronyms and symbols as they appear in the text BDA Bayesian Data Analysis ECDA External Corrosion Direct Assessment ILI In-Line Inspection MFL Magnetic Flux Leakage POD Probability of Detection pdf Probability density function pmf Probability mass function pwt Pipe wall thickness %pwt Percent of the pipe wall thickness GEV Generalized Extreme Value (distribution) NegBin Negative Binomial (distribution) COV Coefficient of Variation ME measurement error PA Pipeline A, subjected to ILI PB Pipeline B, subjected to field inspection K–S Kolmogorov–Smirnov goodness of fit test P–v2 Pearson’s chi squared v2 goodness of fit test MSE mean squared error hi Distribution (prior or posterior) parameter values h Vector of distribution parameter values {hi} H Space of vectors h xj Measured/observed data values X Vector of observed data points {xj} p(h) Prior distribution L (X|h) Likelihood function, also called sampling distribution Po(h | X) Posterior distribution x̂ Unobserved (predicted) value for variable x ak Hyperparameters of the distribution for parameter hi a Vector of hyperparameters {ak} Ppðx̂jX;aÞ Predictive distribution for unobserved data points Mean of the prior distribution for parameter hi, for variable x Variance of the prior distribution for parameter hi, for variable x M Assumed model or distribution for the observed data d Corrosion defect depth ‘ corrosion defect length n Corrosion defect density (number of defects per excavation site) l Location parameter of the GEV distribution r Scale parameter of the GEV distribution f Shape parameter of the GEV distribution di Measured defect depth values D Vector of measured defect depths {di} ‘i Measured defect length values K Vector of measured defect lengths f‘ig ni Measured defect count values N Vector of observed number of defects per excavation site {ni} p Probability parameter of the NegBin distribution g Size parameter of the NegBin distribution fGEVðxÞ pdf of the GEV distribution fNBðnÞ pmf of the NegBin distribution fD(�) Model pdf of defect depth distribution fK(�) Model pdf of defect length distribution fN(�) Model pmf of defect density distribution HD Space of vector of parameters hd for the defect depth distribution HK Space of vector of parameters h‘ for the defect length distribution HN Space of vector of parameters hn for the defect counts distribution PoðhdjD;adÞ Posterior joint pdf for parameters hd of the (GEV) depth distribution LðDjld;rd; fdÞ Likelihood distribution function for observed depth data vector D pðldjadl Þ Prior distribution for the location ld of the (GEV) depth distribution pðrdjadr Þ Prior distribution for the scale rd of the (GEV) depth distribution pðfdjadf Þ Prior distribution for the shape fd of the (GEV) depth distribution Poðh‘jK;a‘Þ Posterior joint distribution of parameters h‘ of the (GEV) length distribution LðKjl‘;r‘; f‘Þ Likelihood function for observed length data vector K pðl‘ja‘l Þ Prior distribution for the location l‘ of the (GEV) length distribution (continued on next page) 44 F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 pðr‘ja‘r Þ Prior distribution for the scale r‘ of the (GEV) length distribution pðf‘ja‘f Þ Prior distribution for the shape f‘ of the (GEV) length distribution PoðhnjN;anÞ Posterior joint distribution of parameters hn of the (NegBin) defect density distribution LðNjgn; pnÞ Likelihood function for observed defect density data vector N pðgnjang Þ Prior distribution for parameter gn of the (NegBin) defect density distribution pðpnjanp Þ Prior distribution for parameter Pn of the (NegBin) defect density distribution nD Number of excavated ditches in the inspected pipeline nT Total number of defects observed in the inspected pipeline nh Number of parameters of the model nS Number of simulations Ppðd̂jD;adÞ Predictive pdf of the unobserved depths Ppð‘̂jK;a‘Þ Predictive pdf of the unobserved lengths Ppðn̂jN;anÞ Predictive pdf of the unobserved defect densities Mean of the Normal distribution describing the MEs Variance of the Normal distribution describing the MEs pdf of the Normal distribution describing the MEs POD(d) Probability of detection function for defect depth f MED ðdjhdÞ ME-affected model distribution for defect depth f PODD ðdjhdÞ POD-affected model distribution for defect depth f POD;MED ðdjhdÞ ME- and POD-affected model distribution for defect depth f PODN ðnjhnÞ POD-affected model distribution for defect density P�Pðn̂jNÞ Discrete predictive pmf of the unobserved defect density P�oðhnjNÞ Discrete posterior pmf of parameters hn P�Pðx̂jXÞ Discrete predictive pmf of an unobserved defect variable x f �XðxjhxÞ Discrete model pmf for a defect variable x P�oðhxjXÞ Discrete posterior joint pmf of model parameters hx i �h Non-parametric mean of the estimated iPo(h) at the ith calculation step sd(ih) Non-parametric standard deviation of the estimated iPo(h) at the ith calculation step DðĥxÞ Mean squared error function of the estimated parameters ĥx References [1] G. Ogutcu, Pipeline risk assessment by Bayesian belief network, in: Proc. of the 6th ASME Int. Pipeline Conf. (IPC 2006), Calgary, Canada, September 2006, Paper IPC2006-10088, vol. 3, PART B, 2007, pp. 931–935. [2] F. Ayello, T. Alfano, D. Hill, N. Sridhar, A Bayesian network based pipeline risk management, in: Proc. of the NACE Int. Corros. Conf. Series, Corrosion 2012, Salt Lake City, UT, U.S., March 2012, Paper 92147, vol. 1, 2012, pp. 579–592. [3] A. Ainouche, Future integrity management strategy of a gas pipeline using Bayesian risk analysis, in: Proc. of the Int. Gas Union 23rd World Gas Conf. 2006, Amsterdam, Netherlands, June 2006, vol. 2, 2006, pp. 756–769. [4] S. Zhang, W. Zhou, H. Qin, Inverse Gaussian process-based corrosion growth model for energy pipelines considering the sizing error in inspection data, Corros. Sci. 73 (2013) 309–320. [5] J.L. Alamilla, E. Sosa, Stochastic modelling of corrosion damage propagation in active sites from field inspection data, Corros. Sci. 50 (2008) 1811–1819. [6] M.A. Maes, M.H. Faber, M.R. Dann, Hierarchical modeling of pipeline defect growth subject to ILI uncertainty, in: Proc. of the Int. Conf. on Offshore Mechanics and Arctic Eng. – OMAE2009, Honolulu, HI, U.S., May 2009, Paper 80425OMAE, 2009, pp. 375–384. [7] J.L. Alamilla, D. Campos, E. Sosa, Estimation of corrosion damages by Bayesian stochastic models, Struct. Infrastruct. Eng. 8 (2012) 411–423. [8] H. Qin, S. Zhang, W. Zhou, Inverse Gaussian process-based corrosion growth modeling and its application in the reliability analysis for energy pipelines, Front. Struct. Civ. Eng. 7 (2013) 276–287. [9] S. Zhang, W. Zhou, System reliability of corroding pipelines considering stochastic process-based models for defect growth and internal pressure, Int. J. Press. Ves. Pip. 111–112 (2013) 120–130. [10] M.D. Pandey, D. Lu, Estimation of parameters of degradation growth rate distribution from noisy measurement data, Struct. Saf. 43 (2013) 60–69. [11] S. Zhang, W. Zhou, Probabilistic characterisation of metal-loss corrosion growth on underground pipelines based on geometric Brownian motion process, Struct. Infrastruct. (2014), http://dx.doi.org/10.1080/ 15732479.2013.875045. [12] M. Al-Amin, W. Zhou, S. Zhang, S. Kariyawasam, H. Wang, Hierarchical Bayesian corrosion growth model based on in-line inspection data, J. Press. Ves. Technol. – Trans. ASME 136 (2014) (Paper 041401). [13] M. Khalifa, F. Khan, M. Haddara, Bayesian sample size determination for inspection of general corrosion of process components, J. Loss Prev. Process Ind. 25 (2012) 218–223. [14] M. Khalifa, F. Khan, M. Haddara, Inspection sampling of pitting corrosion, Insight 55 (2013) 290–296. [15] A. Francis, M. McCallum, M.T. Van Os, P. Van Mastrigt, A new probabilistic methodology for undertaking external corrosion direct assessment, in: Proc. of the 6th ASME Int. Pipeline Conf. (IPC 2006), Calgary, Canada, September 2006, Paper IPC2006-10092, vol. 3, PART B, 2007, pp. 937–950. [16] A. Francis, M. McCallum, C. Jandu, Pipeline life extension and integrity management based on optimized use of above ground survey data and inline inspection results, Strength Mater – Engl. Tr. 41 (2009) 478–492. [17] M.T. Van Os, Direct assessment-1: software module hones system-wide practices, Oil Gas J. 104 (37) (2006) 56–62. [18] M.T. Van Os, Conclusion: ECDA tunes Gasunie corrosion predictions, Oil Gas J. 104 (38) (2006) 59–63. [19] M. Van Burgel, M. De Wacht, M.T. Van Os, A complete and integrated approach for the assessment of non-piggable pipelines, in: Proc. of the International Gas Union Research Conference 2011, IGRC 2011, Seoul, South Korea; October 2011, vol. 3, 2011, pp. 1689–1703. [20] M. Dann, M.A. Maes, Spatial hierarchical PODmodel for in-line inspection data, in: Proc. of the 11th Int. Conf. on Applications of Statistics and Probability in Civil Engineering 2011, ICASP, Zurich, Switzerland, August 2011, Paper 88343, 2011, pp. 2274–2282. [21] M. Al-Amin, W. Zhou, S. Zhang, S. Kariyawasam, H. Wang, Bayesian model for calibration of ILI tools, in: Proc. of the 9th ASME Int. Pipeline Conf. (IPC 2012), Calgary, Canada, September 2012, Paper IPC2012- 90491, vol. 2, 2012, pp. 201–208. [22] T. Breton, J.C. Sanchez-Gheno, J.L. Alamilla, J. Alvarez-Ramirez, Identification of failure type in corroded pipelines: a Bayesian probabilistic approach, J. Hazard. Mater. 179 (2010) 628–634. [23] M.A. Maes, M. Dann, M.M. Salama, Influence of grade on the reliability of corroding pipelines, Reliab. Eng. Syst. Safe. 93 (2008) 447–455. [24] M. Chookah, M. Nuhi, M. Modarres, A probabilistic physics-of-failure model for prognostic health management of structures subject to pitting and corrosion- fatigue, Reliab. Eng. Syst. Safe. 96 (2011) 1601–1610. [25] S. Yain, F. Ayello, J.A. Beavers, S. Sridhar, Probabilistic model for stress corrosion cracking of underground pipelines using Bayesian networks, in: Proc. of the NACE Int. Corros. Conf. Series, Corrosion 2013, Orlando, FL, U.S., March 2013, Paper 2616, 2013, pp. 579–592. [26] X.X. Yuan, D. Mao, M.D. Pandey, A Bayesian approach to modeling and predicting pitting flaws in steam generator tubes, Reliab. Eng. Syst. Safe. 94 (2009) 1838–1847. [27] M. Kowaka, H. Tsuge, M. Akashi, K. Masamura, H. Ishimoto, Introduction to the Life Prediction of Plant Materials, Allerton Press Inc., New York, 1994. [28] D. Rivas, F. Caleyo, A. Valor, J.M. Hallen, Extreme value analysis applied to pitting corrosion experiments in low carbon steel: comparison of block maxima and peak over threshold approaches, Corros. Sci. 50 (2008) 3193– 3204. [29] A. Jarrah, M. Bigerelle, G. Guillemot, D. Najjar, A. Iost, J.-M. Nianga, A generic statistical methodology to predict the maximum pit depth of a localized corrosion process, Corros. Sci. 53 (2011) 2453–2467. [30] A. Valor, F. Caleyo, L. Alfonso, J. Vidal, J.M. Hallen, Statistical analysis of pitting corrosion field data and their use for realistic reliability estimations in non- piggable pipeline systems, Corrosion (2014), http://dx.doi.org/10.5006/1195. http://refhub.elsevier.com/S0010-938X(14)00441-7/h0020 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0020 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0020 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0025 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0025 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0035 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0035 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0040 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0040 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0040 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0045 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0045 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0045 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0050 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0050 http://dx.doi.org/10.1080/15732479.2013.875045 http://dx.doi.org/10.1080/15732479.2013.875045 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0060 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0060 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0060 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0065 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0065 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0065 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0070 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0070 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0080 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0080 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0080 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0085 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0085 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0090 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0090 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0110 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0110 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0110 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0115 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0115 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0120 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0120 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0120 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0130 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0130 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0130 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0135 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0135 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0135 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0140 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0140 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0140 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0140 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0145 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0145 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0145 http://dx.doi.org/10.5006/1195 F. Caleyo et al. / Corrosion Science 90 (2015) 33–45 45 [31] J.K. Kruschke, Doing Bayesian Data Analysis, Acad. Press, Burlington, MA, 2011. [32] E. Castillo, A.S. Hadi, N. Balakrishnan, J.M. Sarabia, Extreme value and related models with applications in engineering and science, Wiley-Interscience, New York, 2004. [33] M.T. Boswell, G.P. Patil, Chance mechanisms generating the negative binomial distributions, in: G.P. Patil (Ed.), Random Counts in Models and Structures, vol. 1, Pennsylvania State University Press, University Park, PA, 1970, pp. 1–22. [34] S.X. Li, S.R. Yu, H.L. Zeng, J.H. Li, R. Liang, Predicting corrosion remaining life of underground pipelines with a mechanically-based probabilistic model, J. Petrol. Sci. Eng. 65 (2009) 162–166. [35] S.X. Li, H.L. Zeng, S.R. Yu, X. Zhai, S.P. Chen, R. Liang, L. Yu, A method of probabilistic analysis for steel pipeline with correlated corrosion defects, Corros. Sci. 51 (2009) 3050–3056. [36] F.A.V. Bazán, A.T. Beck, Stochastic process corrosion growth models for pipeline reliability, Corros. Sci. 74 (2013) 50–58. [37] J.C. Velazquez, F. Caleyo, A. Valor, J.M. Hallen, Predictive model for pitting corrosion in buried oil and gas pipelines, Corrosion 65 (2009) 332–342. [38] J.C. Velázquez, F. Caleyo, A. Valor, J.M. Hallen, Technical note: field study – pitting corrosion of underground pipelines related to local soil and pipe characteristics, Corrosion 66 (2010) 0160011–0160015. [39] F. Caleyo, J.C. Velázquez, A. Valor, J.M. Hallen, Probability distribution of pitting corrosion depth and rate in underground pipelines: a Monte Carlo study, Corros. Sci. 51 (2009) 1925–1934. [40] F. Caleyo, J.C. Velázquez, A. Valor, J.M. Hallen, Markov chain modelling of pitting corrosion in underground pipelines, Corros. Sci. 51 (2009) 2197–2207. [41] F. Caleyo, L. Alfonso, J.M. Hallen, J.L. González, E. Pérez-Baruch, Method proposed for calibrating MFL, UT ILI tools, Oil Gas J. 102 (34) (2004) 76–88. [42] W.A. Fuller, Measurement Error Models, Wiley-Interscience, New York, 2006. [43] Wolfram Research Inc, Mathematica, Version 9.0, Champaign, IL, 2013. [44] F. Caleyo, L. Alfonso, J.H. Espina-Hernandez, J.M. Hallen, Criteria for performance assessment and calibration of in-line inspections of oil and gas pipelines, Meas. Sci. Technol. 18 (7) (2007) 1788–1799. http://refhub.elsevier.com/S0010-938X(14)00441-7/h0155 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0155 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0160 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0160 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0160 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0160 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0165 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0165 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0165 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0165 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0165 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0170 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0170 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0170 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0175 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0175 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0175 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0180 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0180 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0185 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0185 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0190 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0190 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0190 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0195 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0195 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0195 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0200 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0200 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0205 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0205 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0210 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0210 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0220 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0220 http://refhub.elsevier.com/S0010-938X(14)00441-7/h0220 Bayesian analysis of external corrosion data of non-piggable underground pipelines 1 Introduction 2 Theoretical foundations 2.1 Bayes’ theorem 2.2 Bayesian prediction 3 BDA framework 3.1 Variables of interest 3.2 Posterior distributions 3.3 Joint sampling and prior distributions 3.4 Predictive distributions 3.5 Probability of detection and measurement errors 4 Implementation 4.1 Grids for the depth, length, and density of the corrosion defects 4.2 Sampling, posterior, and predictive distributions over the grid 4.3 Iterative adaptive computational approach 5 Illustration and validation 5.1 Estimations from a single field sampling data set 5.2 Statistical performance of the estimations 5.3 Number of inspection sites and number of observed defects for accurate estimations 6 Conclusions Acknowledgments Appendix A References