TechRxiv
krathu-500.pdf (835.89 kB)
Download file

Krathu-500: Post-Comments Thai Corpus

Download (835.89 kB)
The Krathu-500 contains 574 Pantip posts title, post body with all comments of each post. The number of total comments is at 63,293 comments. The corpus provide Thai language used in real life situation with various context and types in conversational form. The corpus serves as a good way to improve capability of machine learning techniques that dealing with Thai language. Sentiment labeled smaller version of the comments dataset also provided with 6,306 records. The labeled corpus is human-annotated dataset with three labels for negative, neutral, and positive comments. The project also consists of open-source repository that allow any people who interested to modify and built on top of the current source code and dataset.

History

Email Address of Submitting Author

pittawat.ta@mail.kmutt.ac.th

ORCID of Submitting Author

0000-0002-6824-2634

Submitting Author's Institution

King Mongkut's University of Technology Thonburi

Submitting Author's Country

  • Thailand

Usage metrics

    Licence

    Exports