Revistes Publicacions URV: SORT - Statistics and Operations Research Transactions> 2021

Joint outlier detection and variable selection using discrete optimization

  • Datos identificativos

    Identificador: RP:4689
    Autores:
    Abdallah, MaherCanu, StephaneJammal, Mahdi
    Resumen:
    In regression, the quality of estimators is known to be very sensitive to the presence of spurious variables and outliers. Unfortunately, this is a frequent situation when dealing with real data. To handle outlier proneness and achieve variable selection, we propose a robust method performing the outright rejection of discordant observations together with the selection of relevant variables. A natural way to define the corresponding optimization problem is to use the ℓ0 norm and recast it as a mixed integer optimization problem. To retrieve this global solution more efficiently, we suggest the use of additional constraints as well as a clever initialization. To this end, an efficient and scalable non-convex proximal alternate algorithm is introduced. An empirical comparison between the ℓ0 norm approach and its ℓ1 relaxation is presented as well. Results on both synthetic and real data sets provided that the mixed integer programming approach and its discrete first order warm start provide high quality solutions.
  • Otros:

    Autor según el artículo: Abdallah, Maher Canu, Stephane Jammal, Mahdi
    Palabras clave: Robust optimization
    Resumen: In regression, the quality of estimators is known to be very sensitive to the presence of spurious variables and outliers. Unfortunately, this is a frequent situation when dealing with real data. To handle outlier proneness and achieve variable selection, we propose a robust method performing the outright rejection of discordant observations together with the selection of relevant variables. A natural way to define the corresponding optimization problem is to use the ℓ0 norm and recast it as a mixed integer optimization problem. To retrieve this global solution more efficiently, we suggest the use of additional constraints as well as a clever initialization. To this end, an efficient and scalable non-convex proximal alternate algorithm is introduced. An empirical comparison between the ℓ0 norm approach and its ℓ1 relaxation is presented as well. Results on both synthetic and real data sets provided that the mixed integer programming approach and its discrete first order warm start provide high quality solutions.
    Año de publicación de la revista: 2021
    Tipo de publicación: ##rt.metadata.pkp.peerReviewed## info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/article