A.V. Skatkov1, A.A. Bryukhovetskiy1, D.V. Moiseev1, Iu.E. Shishkin1,2
1 Federal State Educational Institution of Higher Education «Sevastopol State University», Russian Federation, Sevastopol, Universitetskaya St., 33
2 Institute of Natural and Technical Systems, Russian Federation, Sevastopol, Lenin St., 28
Data monitoring on environment industrial pollution is associated with filtering, processing and accumulation of multidimensional, heterogeneous, dynamically updated data. The processing of such data by traditional methods is very difficult because they are semistructured. Therefore, it seems promising to use clustering operations, which have proven effective in such cases. Thus, a problem arises concerning the detection of anomalous values in such data which should be solved on the basis of clustering.
Since known methods for identifying anomalous values are oriented for data presented in vector or matrix form, it becomes relevant to develop methods for detecting anomalies in cluster form data. To solve this problem, the Kullback measure is used, which is known as the information characteristic for data represented by series of distributions. In this case, it is proposed to use the Kullback measure as a tool for numerical metric calculation of dynamically changing clusters and their number. To implement the operational component of the Kullback measure, a complete graph is used that characterizes the Kullback information measure as the distance between the classes in question.
An example of the implementation of the proposed approach by numerical modeling and graphical illustration of the dynamic process of cluster formation and their power is given. An algorithm is proposed for dynamically correcting the structure of classes and their number, obtaining current data and the results of their presentation in the form of the described graphs and distance characteristics is proposed. On this basis, an adaptive decision-making procedure in the uncertainty conditions is formed.
Keywords: analysis, anomalies, big data, clustering, Kullback measure, monitoring, environment, forecasting, ecosystems.
LIST OF REFERENCES:
- About the approval of the Concept of construction and development of the agro-industrial complex “Safe city”, approved by By order of the Government of the Russian Federation 03.12.2014, No. 2446-R. [Electronic resource] – access Mode: http://14.mchs.gov.ru/document/2632303 (accessed: 12.01.2018).
- Current state of black sea water pollution / ed. by A. I. Simonov, A. I. Ryabinin / / Hydrometeorology and hydrochemistry of the seas. Vol. IV. Black Sea. Vol. 3. Sevastopol: EKOSI-Hydrophysics, P. 230.
- Main sources of pollution of the marine environment of the Sevastopol region / E. I. Ovsyany, A. S. Romanov, R. Ya. Minkovskaya [et al.] / / Environmental safety of coastal and shelf zones and integrated use of shelf resources. Sevastopol: EKOSI-Hydrophysics, 2001. P. 138-152.
- Skatkov A.V., Bryukhovetsky A. A., Moiseev D. V. Methodology of organization of monitoring processes in solving large-scale tasks in cloud computing environments // Information technologies and information security in science, technology and education “INFOTECH-2017”: collection of articles vseros. science.- tech. Conf. Sevastopol state University, Institute of Information technologies and management in technical systems. Sevastopol: Sevgi, 2017. P. 78-80.
- Bondur V. G. Aerospace monitoring of oil and gas facilities. Moscow: Scientific world, 2012. P. 558.
- Reliable and plausible conclusion in intelligent systems / V. N. Vagin, E. Yu. Golovina, A. A. Zagoryanskaya [et al.]. Moscow: Fizmatlit, 2008. P. 712.
- Analysis of data and processes / A. A. Barseghyan, I. I. Kholod, M. D. Tess [et al.]. Saint Petersburg: BHV-Petersburg, 2009. P. 512.
- Barseghyan et al. Methods and models of data analysis: OLAP and Data Mining. SPb., 2004. P. 478.
- Ganti V., Gerke J., Ramakrishnan R. Data Mining in ultra-large databases // Open systems, 1999. № 9-10.
- Gumerov V. A., DLI M. I., Kruglov V. V. Temporal variability of images // MPEI Vestnik. 2003. № 2. P. 91-95.
- Skatkov A.V., Shishkin Yu. E. Model for detecting anomalies in observations of environmental field parameters using monitoring systems. Sevastopol: IPTS. 2017. No. 10 (30). P. 48-53.
- Kulbak S. Information Theory and statistics. Moscow: Nauka, 1967. P. 408.
- Hurvich, C.M., & Tsai, C.L. Regression and time series model selection in small samples. Biometrika. 1989. Vol. 76(2), 297–DOI:10.1093/biomet/76.2.297.