In our recent blog “New words and language processing by computer”, we presented the newest Chinese buzzwords selected by Chilin. In this blog, we explore how the Chilin’s LiVaC system collects and filters the new Chinese buzzwords.
The LiVaC system has collected over 2 million words used in Chinese newspapers over the last 25 years. It collects news regularly in the Pan-Chinese communities, such as Beijing, Hong Kong, Macau, Shanghai, Singapore, and Taiwan. LiVaC then segments the continuous character strings into words using statistical and other means based on its own dictionaries, and rank the words by the frequency of occurrence. We select our new buzzwords from the most common ones because saliency is reflected by frequency of occurrence. Finally, an annual roster is established for each community and then since 2004 a Pan-Chinese Roster has been published annually.
The Buzzwords Rosters are worth analysing, as they reflect on the life and times of the different Chinese communities in the previous year. They mark the important events in the social, economic, and political developments of society. Also, a series of rosters over times can act as time capsules tracing the most memorable developments in each society, or in the Pan-Chinese communities as a whole. This can be illustrated by recapitulating and comparing the Pan-Chinese Buzzwords of the first Roster in 2004 and the most recent one in 2020.
2004 (upper) and 2020 (lower)
In 2004, 「健宮」(“Jiangong”) and “IgG、IgM” are related to the geographical location and composition of SARS. On the other hand, in 2020, 「居家令」(“Stay-at-Home Order”) and 「逆行者」(“Heroes in harm’s way”) are related to two consequences of the Coronavirus. The new buzzwords mark two of the most significant epidemics in history. Furthermore, in 2020, 「直播帶貨」 (“Livestream Sales”) and 「雲會議」(“Cloud-based Conference Call”) reflect on two more new developments than SARS in 2004 and that Coronavirus has brought about more consequential impact to the world than SARS.
The cognitively salient terms can also help us to meaningfully explore and analyse the rosters and relationships among the different communities and in variable timeframes. We will discuss various practical applications in the future.
For more analysis regarding these two years, please see the press releases of 2004 and 2020. For other rosters between 2005 and 2019, please see our Pan-Chinese New Word Roster. We also welcome any feedback as well as exploitation of our Chinese text processing programs via our contact page.
Abel & Yuki @Chilin