Feature selection in an interactive search-based PLA design approach
DOI:
https://doi.org/10.5753/jserd.2025.4697Keywords:
Interactive search-based Software Engineering, Machine Learning, Feature SelectionAbstract
The Product Line Architecture (PLA) is one of the most important artifacts of a Software Product Line (SPL). PLA design can be formulated as an interactive optimization problem with many conflicting factors. Incorporating Decision Makers’ (DM) preferences during the search process may help the algorithms find more adequate solutions for their profiles. Interactive approaches allow the DM to evaluate solutions, guiding the optimization according to their preferences. However, this brings up human fatigue problems caused by excessive interactions and solutions to evaluate. A common strategy to prevent this problem is limiting the number of interactions and solutions the DM evaluates. Machine Learning (ML) models were also used to learn how to evaluate solutions according to the DM profile and replace them after some interactions. Feature selection performs an essential task as non-relevant and/or redundant features used to train the ML model can reduce the accuracy and comprehensibility of the hypotheses induced by ML algorithms. This study aims to enhance the usage of an ML model in an interactive search-based PLA design approach by addressing two critical challenges: mitigating decision-maker fatigue through feature selection and improving computational efficiency, particularly during testing phases. We applied four selectors, and through results, We managed to reduce 30% of the features and 25% of the time spent on testing, achieving an accuracy of 99%.
Downloads
References
Amal, B., Kessentini, M., Bechikh, S., Dea, J., and Said, L. B. (2014). On the use of machine learning and search-based software engineering for ill-defined fitness function: a case study on software refactoring. In International Symposium on Search-based Software Engineering (SSBSE), pages 31–45.
Arasaki, C., Wolshick, L., Freire, W. M., and Amaral, A. M. M. M. (2023). Feature selection in an interactive search-based pla design approach. [link].
Bindewald, C. V., Freire, W. M., Amaral, A. M. M. M., and Colanzi, T. E. (2019). Towards the support of user preferences in search-based product line architecture design: an exploratory study. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering (SBES), pages 387–396.
Bindewld, C. V., Freitas, W. M., Anaral, A. M. M. M., and Colanzi, T. E. (2020). Supporting user preferences in search-based product line architecture design using machine learning. In XIV Brazilian Symposium on Software Components, Architectures, and Reuse. SBC.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Booch, G., Rumbaugh, J., and Jacobson, I. (1998). The unified modeling language user guide. Addison-Wesley Professional.
Cleary, J. G. and Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. Machine Learning, 21(1-2):61–81.
Colanzi, T. E., Vergilio, S. R., Gimenes, I. M. S., and Oizumi, W. N. (2014). A search-based approach for software product line design. In 18th International Software Product Line Conference, pages 237–241.
Contieri, A. C., Correia, G. G., Colanzi, T. E., Gimenes, I. M., OliveiraJr, E. A., Ferrari, S., Masiero, P. C., and Garcia, A. F. (2011). Extending UML components to develop software product-line architectures: Lessons learned. In European Conference on Software Architecture, pages 130–138.
Ferreira, T. N., Vergilio, S. R., and de Souza, J. T. (2017). Incorporating user preferences in search-based software engineering: A systematic mapping study. Information and Software Technology, 90:55–69.
Ferreira FN, Araújo, A. A., Neto, A. D. B., and de Souza, J. T. (2016). Incorporating user preferences in ant colony optimization for the next release problem. Applied Software Computing, 49:1283–1296.
Freire, W., Rosa, C., Amaral, A., and Colanzi, T. (2022). Validating an interactive ranking operator for NSGA-II to support the optimization of software engineering problems. In Proceedings of the XXXVI Brazilian Symposium on Software Engineering (SBES), pages 337–346.
Freire, W. M., Bindewald, C. V., Amaral, A. M. M., and Colanzi, T. E. (2019). Supporting decision makers in search-based product line architecture design using clustering. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), volume 1, pages 139–148.
Freire, W. M., Massago, M., Zavadski, A. C., Amaral, A. M. M. M., and Colanzi, T. E. (2020). OPLA-Tool v2.0: a tool for product line architecture design optimization. In 34th Brazilian Symposium on Software Engineering (SBES).
Gnanambal S, M, T., V.T, M., and V, G. (2018). Classification algorithms with attribute selection: an evaluation study using weka. International Journal of Advanced Networking and Applications, 09(06):3640–3644.
Hall, M. and M., G. (2022). Bestfirst — weka. [link]. Accessed in April 2023.
Hall, M. A. (1998). Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand.
Harman, M. and Jones, B. F. (2001). Search-based software engineering. Information and Soft. Technology, 43(14):833–839.
Kang, K. C., Cohen, S. G., Hess, J. A., Novak, W. E., and Peterson, A. S. (1990). Feature-oriented domain analysis (foda) feasibility study. Technical report, DTIC Document.
Kruskal, W. H. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260):583–621.
K.Sutha and Tamilselvi, D. J. (2015). A review of feature selection algorithms for data mining techniques. International Journal on Computer Science and Engineering, 7(6).
Kuviatkovski, F. H., Freire, W. M., Amaral, A. M., Colanzi, T. E., and Feltrim, V. D. (2022). Evaluating machine learning algorithms in representing decision makers in search-based pla. In 2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C), pages 68–75. IEEE.
Likert, R. (1932). A technique for the measurement of attitudes. Archives of psychology.
Liu, H. and Motoda, H. (2012). Feature selection for knowledge discovery and data mining, volume 454. Springer Science & Business Media.
M. Ramaswami and Bhaskaran, R. (2009). A study on feature selection techniques in educational data mining. JOURNAL OF COMPUTING, 1(1).
Mjolsness, E. and Decoste, D. (2001). Machine learning for science: State of the art and future prospects. Science (New York, N.Y.), 293:2051–5.
Mkaouer, M. W., Kessentini, M., Slim, B., and Tauritz, D. R. (2013). Preference-based multi-objective software modelling. In Proceedings of the 1st International Workshop on Combining Modelling and Search-Based Software Engineering, pages 61–66.
OliveiraJr, E., Gimenes, I. M. S., and Maldonado, J. C. (2010). Systematic Management of Variability in UML-based Software Product Lines. Journal of Universal Computer Science, 16:2374–2393.
Pohl, K., Böckle, G., and van Der Linden, F. J. (2005). Software product line engineering: foundations, principles and techniques. Springer.
Quinlan, J. R. (2014). C4.5: programs for machine learning. Elsevier.
Ramirez, A., Romero, J. R., and Simons, C. (2018). A systematic review of interaction in search-based software engineering. IEEE Transactions on Software Engineering.
Rosa, C. T., Freire, W. M., Amaral, A. M. M. M., and Colanzi, T. E. (2022). Towards an interactive ranking operator for NSGA-II. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO), pages 794–797.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. In Nature, volume 323, pages 533–536. Springer.
Russell, S., Norvig, P., and Davis, E. (2016). Artificial Intelligence: A Modern Approach. Pearson.
SEI (2009). Software Engineering Institute - the Arcade Game Maker pedagogical product line. [link]. Accessed in 2018 August.
Simons, C., Singer, J., and White, D. R. (2015). Search-based refactoring: Metrics are not enough. In Barros, M. and Labiche, Y., editors, Search-Based Software Engineering, pages 47–61.
Sunita Beniwal and Arora, J. (2012). Classification and feature selection techniques in data mining. International Journal of Engineering Research & Technology, 1(6).
Verdecia, Y. D., Colanzi, T. E., Vergilio, S. R., and Santos, M. C. (2017). An enhanced evaluation model for search-based product line architecture design. In 20th Iberoamerican Conference on Software Engineering (CIbSE), pages 155–168, San Jose, Costa Rica. CIbSE.
Vikhar, P. A. (2016). Evolutionary algorithms: A critical review and its future prospects. International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), pages 261–265.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Caio Vieira Arasaki, Lucas Wolschick, Willian Marques Freire, Aline Maria Malachini Miotto Amaral

This work is licensed under a Creative Commons Attribution 4.0 International License.

