dc.contributor.author | Iparraguirre Villanueva, Orlando | |
dc.contributor.author | Sierra Liñan, Fernando | |
dc.contributor.author | Herrera Salazar, Jose Luis | |
dc.contributor.author | Beltozar Clemente, Saul | |
dc.contributor.author | Pucuhuayla Revatta, Félix | |
dc.contributor.author | Zapata Paulini, Joselyn | |
dc.contributor.author | Cabanillas Carbonell, Michael | |
dc.date.accessioned | 2023-10-20T12:09:20Z | |
dc.date.available | 2023-10-20T12:09:20Z | |
dc.date.issued | 2023-01-25 | |
dc.identifier.citation | Iparraguirre, O., Sierra, F., Herrera, J. L., Beltozar, S., Pucuhuayla, F., Zapata, J., & Cabanillas, M. (2023). Search and classify topics in a corpus of text using the latent dirichlet allocation model. Indonesian Journal of Electrical Engineering and Computer Science, 30(1), 246-256. http://doi.org/10.11591/ijeecs.v30.i1.pp246-256 | es_PE |
dc.identifier.other | . | es_PE |
dc.identifier.uri | https://hdl.handle.net/11537/34685 | |
dc.description.abstract | This work aims at discovering topics in a text corpus and classifying the most
relevant terms for each of the discovered topics. The process was performed
in four steps: first, document extraction and data processing; second, labeling
and training of the data; third, labeling of the unseen data; and fourth,
evaluation of the model performance. For processing, a total of 10,322
"curriculum" documents related to data science were collected from the web
during 2018-2022. The latent dirichlet allocation (LDA) model was used for
the analysis and structure of the subjects. After processing, 12 themes were
generated, which allowed ranking the most relevant terms to identify the skills
of each of the candidates. This work concludes that candidates interested in
data science must have skills in the following topics: first, they must be
technical, they must have mastery of structured query language, mastery of
programming languages such as R, Python, java, and data management,
among other tools associated with the technology. | es_PE |
dc.format | application/pdf | es_PE |
dc.language.iso | spa | es_PE |
dc.publisher | Institute of Advanced Engineering and Science | es_PE |
dc.rights | info:eu-repo/semantics/openAccess | es_PE |
dc.rights | Atribución-NoComercial-CompartirIgual 3.0 Estados Unidos de América | * |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/3.0/us/ | * |
dc.source | Universidad Privada del Norte | es_PE |
dc.source | Repositorio Institucional - UPN | es_PE |
dc.subject | Descubriendo | es_PE |
dc.subject | Asignación latente de Dirichlet | es_PE |
dc.subject | Corpus de texto | es_PE |
dc.title | Search and classify topics in a corpus of text using the latent dirichlet allocation model | es_PE |
dc.type | info:eu-repo/semantics/article | es_PE |
dc.publisher.country | PE | es_PE |
dc.identifier.journal | Indonesian Journal of Electrical Engineering and Computer Science | es_PE |
dc.subject.ocde | https://purl.org/pe-repo/ocde/ford#2.02.04 | es_PE |
dc.description.sede | Los Olivos | es_PE |
dc.identifier.doi | http://doi.org/10.11591/ijeecs.v30.i1.pp246-256 | |