DATA SCIENCE: PENDEKATAN DAN LANGKAH PRAKTIS DENGAN EXCEL
Abstract
The steps in conducting data science activities consist of several stages, namely problem identification, understanding the current business, data collection, data processing, and making decisions based on insights. Researchers who engage in data science activities are often referred to as data scientists. In their process, data scientists use applications to facilitate their data science activities. One application that can be used by data scientists is Excel. Excel has features that can handle a certain amount of data. However, for the initial steps towards becoming a data scientist, Excel is a good application with features that make it easier for researchers to conduct data science activities. Data that can be managed by Excel is not more than 1 million rows, as Excel only has a maximum of 1,048,576 rows and 16,384 columns. Nevertheless, the features in Excel are already powerful, such as error detection, removing duplicate data, correcting error values, detecting outliers, handling missing data, and validating data. This study discusses the functions of these features in an effort to promote data science for beginner data scientists.
References
Georgieva, P., Nikolova, E., & Orozova, D. (2020). Data Cleaning Techniques in Detecting Tendencies in Software Engineering. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), 1028–1033.
Grech, V. (2018). WASP (Write a Scientific Paper) using Excel–3: Plotting data. Early Human Development, 117, 110–112.
Hossain, E. (2021). MS Excel in Engineering Data. In Excel Crash Course for Engineers (pp. 169–242). Springer.
Huang, Z., & He, Y. (2018). Auto-detect: Data-driven error detection in tables. Proceedings of the 2018 International Conference on Management of Data, 1377–1392.
Kaminskyi, R., Kunanets, N., Pasichnyk, V., Rzheuskyi, A., & Khudyi, A. (2018). Recovery Gaps in Experimental Data. COLINS, 108–118.
Liu, R., Glover, K. P., Feasel, M. G., & Wallqvist, A. (2018). General approach to estimate error bars for quantitative structure–activity relationship predictions of molecular activity. Journal of Chemical Information and Modeling, 58(8), 1561–1575.
Pandita, R., Parnin, C., Hermans, F., & Murphy-Hill, E. (2018). No half-measures: A study of manual and tool-assisted end-user programming tasks in Excel. 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 95–103.
Ruel, E., William, W., & Gillespie, B. J. (2018). Data cleaning. The Practice of Survey Research: Theory and Applications, 208–237.
Setiawan, I. (2021). Perbedaan Data Engineer, Data Scientist Dan Data Analyst. Widya Accarya, 12(2), 306–309.
Sofalvi, S., & Schueler, H. E. (2021). Assessment of Bioanalytical Method Validation Data Utilizing Heteroscedastic Seven-Point Linear Calibration Curves by EZSTATSG1 Customized Microsoft Excel Template. Journal of Analytical Toxicology, 45(8), 772–779.
Wang, P., & He, Y. (2019). Uni-detect: A unified approach to automated error detection in tables. Proceedings of the 2019 International Conference on Management of Data, 811–828.
Wu, Z., Wu, Z., & Rilett, L. R. (2020). Innovative nonparametric method for data outlier filtering. Transportation Research Record, 2674(10), 167–176.