Mostrar el registro sencillo del ítem

dc.contributor.authorIparraguirre Villanueva, Orlando
dc.contributor.authorSierra Liñan, Fernando
dc.contributor.authorHerrera Salazar, Jose Luis
dc.contributor.authorBeltozar Clemente, Saul
dc.contributor.authorPucuhuayla Revatta, Félix
dc.contributor.authorZapata Paulini, Joselyn
dc.contributor.authorCabanillas Carbonell, Michael
dc.date.accessioned2023-10-20T12:09:20Z
dc.date.available2023-10-20T12:09:20Z
dc.date.issued2023-01-25
dc.identifier.citationIparraguirre, O., Sierra, F., Herrera, J. L., Beltozar, S., Pucuhuayla, F., Zapata, J., & Cabanillas, M. (2023). Search and classify topics in a corpus of text using the latent dirichlet allocation model. Indonesian Journal of Electrical Engineering and Computer Science, 30(1), 246-256. http://doi.org/10.11591/ijeecs.v30.i1.pp246-256es_PE
dc.identifier.other.es_PE
dc.identifier.urihttps://hdl.handle.net/11537/34685
dc.description.abstractThis work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 "curriculum" documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) model was used for the analysis and structure of the subjects. After processing, 12 themes were generated, which allowed ranking the most relevant terms to identify the skills of each of the candidates. This work concludes that candidates interested in data science must have skills in the following topics: first, they must be technical, they must have mastery of structured query language, mastery of programming languages such as R, Python, java, and data management, among other tools associated with the technology.es_PE
dc.formatapplication/pdfes_PE
dc.language.isospaes_PE
dc.publisherInstitute of Advanced Engineering and Sciencees_PE
dc.rightsinfo:eu-repo/semantics/openAccesses_PE
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0/*
dc.sourceUniversidad Privada del Nortees_PE
dc.sourceRepositorio Institucional - UPNes_PE
dc.subjectDescubriendoes_PE
dc.subjectAsignación latente de Dirichletes_PE
dc.subjectCorpus de textoes_PE
dc.titleSearch and classify topics in a corpus of text using the latent dirichlet allocation modeles_PE
dc.typeinfo:eu-repo/semantics/articlees_PE
dc.publisher.countryPEes_PE
dc.identifier.journalIndonesian Journal of Electrical Engineering and Computer Sciencees_PE
dc.subject.ocdehttps://purl.org/pe-repo/ocde/ford#2.02.04es_PE
dc.description.sedeLos Olivoses_PE
dc.identifier.doihttp://doi.org/10.11591/ijeecs.v30.i1.pp246-256


Ficheros en el ítem

Thumbnail
Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

info:eu-repo/semantics/openAccess
Excepto si se señala otra cosa, la licencia del ítem se describe como info:eu-repo/semantics/openAccess