Evaluation and comparison of benchmark QSAR models to predict a relevant REACH endpoint: the Bioconcentration Factor (BCF)

IRIS

The bioconcentration factor (BCF) is an important bioaccumulation hazard assessment metric in many regulatory contexts. Its assessment is required by the REACH regulation (Registration, Evaluation, Au- thorization and Restriction of Chemicals) and by CLP (Classification, Labeling and Packaging). We chal- lenged nine well-known and widely used BCF QSAR models against 851 compounds stored in an ad-hoc created database. The goodness of the regression analysis was assessed by considering the determination coefficient (R2) and the Root Mean Square Error (RMSE); Cooper's statistics and Matthew's Correlation Coefficient (MCC) were calculated for all the thresholds relevant for regulatory purposes (i.e. 100 L/kg for Chemical Safety Assessment; 500 L/kg for Classification and Labeling; 2000 and 5000 L/kg for Persistent, Bioaccumulative and Toxic (PBT) and very Persistent, very Bioaccumulative (vPvB) assessment) to assess the classification, with particular attention to the models' ability to control the occurrence of false ne- gatives. As a first step, statistical analysis was performed for the predictions of the entire dataset; R240.70 was obtained using CORAL, T.E.S.T. and EPISuite Arnot–Gobas models. As classifiers, ACD and log P-based equations were the best in terms of sensitivity, ranging from 0.75 to 0.94. External compound predictions were carried out for the models that had their own training sets. CORAL model returned the best performance (R2ext1⁄40.59), followed by the EPISuite Meylan model (R2ext1⁄40.58). The latter gave also the highest sensitivity on external compounds with values from 0.55 to 0.85, depending on the thresholds. Statistics were also compiled for compounds falling into the models Applicability Domain (AD), giving better performances. In this respect, VEGA CAESAR was the best model in terms of regression (R21⁄40.94) and classification (average sensitivity40.80). This model also showed the best regression (R21⁄40.85) and sensitivity (average40.70) for new compounds in the AD but not present in the training set. However, no single optimal model exists and, thus, it would be wise a case-by-case assessment. Yet, integrating the wealth of information from multiple models remains the winner approach.

Evaluation and comparison of benchmark QSAR models to predict a relevant REACH endpoint: the Bioconcentration Factor (BCF)

Gissi A;Lombardo A;Roncaglioni A;Gadaleta D;Mangiatordi GF;NICOLOTTI, ORAZIO;Benfenati E.

2015-01-01

Abstract

The bioconcentration factor (BCF) is an important bioaccumulation hazard assessment metric in many regulatory contexts. Its assessment is required by the REACH regulation (Registration, Evaluation, Au- thorization and Restriction of Chemicals) and by CLP (Classification, Labeling and Packaging). We chal- lenged nine well-known and widely used BCF QSAR models against 851 compounds stored in an ad-hoc created database. The goodness of the regression analysis was assessed by considering the determination coefficient (R2) and the Root Mean Square Error (RMSE); Cooper's statistics and Matthew's Correlation Coefficient (MCC) were calculated for all the thresholds relevant for regulatory purposes (i.e. 100 L/kg for Chemical Safety Assessment; 500 L/kg for Classification and Labeling; 2000 and 5000 L/kg for Persistent, Bioaccumulative and Toxic (PBT) and very Persistent, very Bioaccumulative (vPvB) assessment) to assess the classification, with particular attention to the models' ability to control the occurrence of false ne- gatives. As a first step, statistical analysis was performed for the predictions of the entire dataset; R240.70 was obtained using CORAL, T.E.S.T. and EPISuite Arnot–Gobas models. As classifiers, ACD and log P-based equations were the best in terms of sensitivity, ranging from 0.75 to 0.94. External compound predictions were carried out for the models that had their own training sets. CORAL model returned the best performance (R2ext1⁄40.59), followed by the EPISuite Meylan model (R2ext1⁄40.58). The latter gave also the highest sensitivity on external compounds with values from 0.55 to 0.85, depending on the thresholds. Statistics were also compiled for compounds falling into the models Applicability Domain (AD), giving better performances. In this respect, VEGA CAESAR was the best model in terms of regression (R21⁄40.94) and classification (average sensitivity40.80). This model also showed the best regression (R21⁄40.85) and sensitivity (average40.70) for new compounds in the AD but not present in the training set. However, no single optimal model exists and, thus, it would be wise a case-by-case assessment. Yet, integrating the wealth of information from multiple models remains the winner approach.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2015

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
63_NICO_EnvRes_2015.pdf non disponibili Tipologia: Documento in Post-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 322.86 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	322.86 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/66236

Citazioni

9

53

49

social impact