krathu-500.pdf (835.89 kB)
Download file

Krathu-500: Post-Comments Thai Corpus

Download (835.89 kB)
The Krathu-500 contains 574 Pantip posts title, post body with all comments of each post. The number of total comments is at 63,293 comments. The corpus provide Thai language used in real life situation with various context and types in conversational form. The corpus serves as a good way to improve capability of machine learning techniques that dealing with Thai language. Sentiment labeled smaller version of the comments dataset also provided with 6,306 records. The labeled corpus is human-annotated dataset with three labels for negative, neutral, and positive comments. The project also consists of open-source repository that allow any people who interested to modify and built on top of the current source code and dataset.


Email Address of Submitting Author

ORCID of Submitting Author


Submitting Author's Institution

King Mongkut's University of Technology Thonburi

Submitting Author's Country

  • Thailand

Usage metrics