Krathu-500: Post-Comments Thai Corpus
preprintposted on 17.12.2021, 02:12 by Pittawat TaveekitworachaiPittawat Taveekitworachai, Jonathan H. ChanJonathan H. Chan
The Krathu-500 contains 574 Pantip posts title, post body with all comments of each post. The number of total comments is at 63,293 comments. The corpus provide Thai language used in real life situation with various context and types in conversational form. The corpus serves as a good way to improve capability of machine learning techniques that dealing with Thai language. Sentiment labeled smaller version of the comments dataset also provided with 6,306 records. The labeled corpus is human-annotated dataset with three labels for negative, neutral, and positive comments. The project also consists of open-source repository that allow any people who interested to modify and built on top of the current source code and dataset.