• А. G. Kryvokhata Zaporizhzhia National University
  • O. V. Kudin Zaporizhzhia National University
  • V. I. Gorbenko Zaporizhzhia National University
Keywords: Acoustic Data Classification, Convolutional Neural Network, Ensemble Learning, IoT


This paper discusses the models and methods of machine learning used in IoT (Internet of Things) systems. Particularly some issues of methods development for sound data classifying of various nature, such as speech, music, environmental sounds etc. A sound classification subsystem could be implemented in the various Smart City, Smart Farming systems etc. Development of IoT software includes following challenges: the lack of computational resources, the relevant small RAM memory size etc. Basically, an automated sound classification system could be decomposed into four parts: the audio representation, the features extraction, the machine learning algorithm and the accuracy estimation. This paper deals with a machine learning algorithms. We use a convolutional neural network for sound classification and the Snapshot method for ensemble learning. A genetic algorithm is a typical strategy for both neural network and ensembles structural optimization. Various methods of the solution representation and crossover functions have been studied in order to find optimal configuration of genetic operators. The objective of the paper is to develop the optimal classifier for embedded sound classification system. The solution representation – genotype – is the set of neural network hyper-parameters includes the number and the type of neural network layers, the type of activation functions, the initial values of weights, the learning rate, etc. Both the Snapshot ensemble method and combination of different neural networks are used for ensemble learning. The key idea of this paper is the optimization with genetic algorithms both neural networks and the ensemble construction method. We compare different genetic operators in order to obtain optimal solution for IoT system.


1. Bagri, M. & Aggarwal, N. (2019). Machine Learning for Internet of Things. International Jour-nal Of Engineering And Computer Science, Vol. 8, Issue 7, pp. 24680–24782. doi: 10.18535/ijecs/v8i07.4346
2. More, S. & Singla, J. (2019). Machine Learning Techniques with IoT in Agriculture. Interna-tional Journal of Advanced Trends in Computer Science and Engineering, Vol. 8, Issue 3, pp. 742–747. doi:
3. Piccialli, F., Cuomo, S., di Cola, V.S. & Casolla, G. (2019). A machine learning approach for IoT cultural data. Journal of Ambient Intelligence and Humanized Computing. doi:
4. Kryvokhata, A. G., Kudin, O. V. & Lisnyak, A. O. (2018). A Survey of Machine Learning Methods for Acoustic Data Classification. Visnyk of Kherson National Technical University, Vol 3, Issue 66, pp. 327–331 (in Ukrainian).
5. Camastra, F. & Vinciarelli, A. (2015). Machime learning for Audio. Image and Video analysis. London: Springer-Verlag.
6. Cecchi, S., Terenzi, A., Orcioni, S., Riolo, P., Ruschioni, S. & Isidoro N. (2018). A Preliminary Study of Sounds Emitted by Honey Bees in a Beehive. 144th AES convention. Retrieved from
7. Nolasco, I. Terenzi, A., Cecchi, S., Orcioni, S., Bear, H. L. & Benetos E. (2018). Audio-based identification of beehive states. Retrieved from
8. Nolasco, I. & Benetos E. (2018). To bee or not to bee: investigating machine learning ap-proaches for beehive sound recognition. Retrieved from
9. Cejrowski, T., Szymaski, J., Mora, H. & Gil D. (2018) Detection of the Bee Queen Presence using Sound Analysis. In Intelligent Information and Database Systems. ACIIDS. Lecture Notes in Computer Science, Vol. 10752. Springer.
10. Cecchi, S., Terenzi, A., Orcioni, S., Spinsante, S., Primiani, V. M., Moglie, F., Ruschioni, S., Mattei, C., Riolo, P. & Isidoro, N. (2019). Multi-sensor platform for real time measurements of honey bee hive parameters. IOP Conf. Series: Earth and Environmental Science, Vol. 275. doi:
11. Bishop, J. C., Falzon, G., Trotter, M., Kwan, P. & Meek, P. D. (2017). Sound analysis and de-tection, and the potential for precision livestock farming - a sheep vocalization case study, 1st Asian-Australasian Conference on Precision Pastures and Livestock Farming. doi:
12. Wolfert, S. Ge, L., Verdouw C. & Bogaardt, M.-J. (2017). Big Data in Smart Farming – A re-view. Agricultural Systems, Vol. 153, pp. 69–80. doi:
13. Hallett, S. H. (2017). Smart cities need smart farms. Environmental Scientist, Vol. 26, Issue 1, pp. 10–17. Retrieved from
14. Alías, F., Socoró, J.C. & Sevillano, X. (2016). A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Applied Sciences, Vol. 6(5):143. doi:
15. Bertin-Mahieux, T., Eck, D. & Mandel, M. (2011). Automatic tagging of audio: the state-of-the-art. Machine audition: principles, algorithms and systems. IGI Global, pp. 334–352. doi:
16. Salamon, J., Jacoby, C. & Bello, J. P. (2017). A dataset and taxonomy for urban sound re-search. Proceedings of the 22nd ACM international conference on Multimedia, pp. 1041–1044. doi:
17. Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J. E. & Weinberger, K. Q. (2017). Snapshot Ensembles: Train 1, Get M for Free. Retrieved from http://
18. Stastný, J., Skorpil, V. & Fejfar, J. (2013). Audio Data Classification by Means of New Algo-rithms. 36th International conference on Telecommunications and Signal Processing 2013, Rome, Italy, pp. 507–511. doi:
19. Xu, Y., Kong, Q., Huang, Q., Wang, W. & Plumbley, M. D. (2017). Convolutional gated recur-rent neural network incorporating spatial features for audio tagging. The 2017 International Joint Conference on Neural Networks (IJCNN 2017), Anchorage, Alaska. doi:
20. Rizzi, A., Buccino, M., Panella, M. & Uncini, A. (2006). Optimal short-time features for mu-sic/speech classification of compressed audio data. International Conference on Intelligent Agents. Sydney, NSW, Australia. doi:
21. Sturm, B. L. (2014). A Survey of Evaluation in Music Genre Recognition. Adaptive Multime-dia Retrieval: Semantics, Context, and Adaptation. AMR 2012. Lecture Notes in Computer Sci-ence, Vol. 8382, pp. 29–66. doi:
22. Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P. J. B. & Plumbley, M. D. (2017). Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging. IEEE/ACM transactions on audio, speech and language processing, Vol. 25, Issue 6, pp. 1230–1241. doi:
23. Zaccone, G. & Karim, Md. R. (2018). Deep learning with TensorFlow.Packt Publishing.
24. Geron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow. Sebastopol: O`Reilly.
25. Gonzalez, J. A., Hurtado, L.-F. & Pla, F. (2019). ELiRF-UPV at SemEval-2019 Task 3: Snap-shot Ensemble of Hierarchical Convolutional Neural Networks for Contextual Emotion Detec-tion. Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019). pp. 195–199. doi:
How to Cite
KryvokhataА. G., Kudin, O. V., & Gorbenko, V. I. (2020). DESIGN SOUND CLASSIFICATION IOT SYSTEM WITH GENETIC ALGORITHMS. Bulletin of Zaporizhzhia National University. Physical and Mathematical Sciences, (2), 69-74. Retrieved from