krathu-500.pdf (835.89 kB)
Download fileKrathu-500: Post-Comments Thai Corpus
preprint
posted on 2021-12-17, 02:12 authored by Pittawat TaveekitworachaiPittawat Taveekitworachai, Jonathan H. ChanJonathan H. ChanThe Krathu-500 contains 574 Pantip posts title, post body with all comments of each post. The number of total comments is at 63,293 comments. The corpus provide Thai language used in real life situation with various context and types in conversational form. The corpus serves as a good way to improve capability of machine learning techniques that dealing with Thai language. Sentiment labeled smaller version of the comments dataset also provided with 6,306 records. The labeled corpus is human-annotated dataset with three labels for negative, neutral, and positive comments. The project also consists of open-source repository that allow any people who interested to modify and built on top of the current source code and dataset.
History
Email Address of Submitting Author
pittawat.ta@mail.kmutt.ac.thORCID of Submitting Author
0000-0002-6824-2634Submitting Author's Institution
King Mongkut's University of Technology ThonburiSubmitting Author's Country
- Thailand