ABCDEFGHIJKLMNOPQRSTUVWXYZAAABAC
1
Resource NameReferenceLink (if available)LanguageSizeSource
Syntactic Representation
UD-based
2
ATDT Albogamy and Ramsay 2017--Arabic--Twitterdependenciesyes
3
Hi-En-CSBhat et al. 2018
https://github.com/UniversalDependencies/UD_Hindi_English-HIENCS
Hindi-English (code switching)1898 tweetsTwitterdependenciesyes
4
TwitterAAE (TAAE)Blodgett et al. 2018
http://slanglab.cs.umass.edu/TwitterAAE/
African American English, Mainstream American English
500 tweetsTwitterdependenciesyes
5
TWITTIRO'-UD (TWRO)Cignarella et al. 2019
https://github.com/UniversalDependencies/UD_Italian-TWITTIRO/tree/master
Italian1424 tweetsTwitterdependenciesyes
6
Denoised Web Treebank (DWT)Daiber et al. 2016
https://jodaiber.github.io/DenoisedWebTreebank/
English500 tweetsTwitterdependenciesno
7
W2.0Foster et al. 2011upon requestEnglish1000 sentencesTwitter, discussion forumsconstituents--
8
Foreebank (Frb)Kaliahi et al. 2015
http://nclt.computing.dcu.ie/mt/confidentmt.html
English, French
1000 sentence/language
technical support forumsconstituents--
9
Tweebank (Twb)Kong et al. 2014
https://github.com/ikekonglp/TweeboParser/tree/master/Tweebank/Train_Test_Splited
English929 tweetsTwitterdependenciesno
10
Tweebank v2 (Twb2)Liu et al. 2018
https://github.com/Oneplus/Tweebank
English3550 tweetsTwitterdependenciesyes
11
TDTLuotolahti et al. 2015--Finnish15136 sentencesCommon Crawl dataset, other web-crawled datadependenciesyes
12
ExtremeUGC (xUGC)Martinez-Alonso et al. 2016upon requestFrench974 sentences
cooking-related web questions, online game chat sessions
dependenciesyes
13
Estonian Web Treebank (EtWT)Muischnek er al. 2014
https://github.com/UniversalDependencies/UD_Estonian-EWT/tree/dev
Estonian5863 sentencesweb-crawled datadependenciesyes
14
ITU Web treebank (ITU)Pamay et al. 2015--Turkish----dependenciesno
15
WeSearch Data Collection (WDC)Read et al. 2012
http://moin.delph-in.net/wiki/WeSearch
English--user forums, product review sites, blogs andWikipedia)constituents--
16
tweeDeRebhein et al. 2019
https://www.cl.uni-heidelberg.de/~rehbein/resources.mhtml
German519 tweetsTwitterdependenciesyes
17
PoSTWITA-UD (Pst)Sanguinetti et al. 2018
https://github.com/UniversalDependencies/UD_Italian-PoSTWITA
Italian6712 tweetsTwitterdependenciesyes
18
French Social Media Bank (FSMB)Seddah et al. 2012
http://pauillac.inria.fr/~seddah/index.php?n=Main.Datasets
French1700 sentencesFacebook, Twitter, discussion forumsconstituents--
19
Narabizi (NBZ)Seddah et al. 2020--North-African Arabizi1300 sentencesnewspaper foradependenciesyes
20
English Web Treebank (EWT)Silveira et al. 2014
https://github.com/UniversalDependencies/UD_English-EWT
English16,622 sentences
weblogs, reviews, newsgroups, email, question-answers
dependencies and constituents
yes
21
LAS-DisFo (LDF)Taulé et al. 2015--
Spanish, Latin American Spanish
2,846 sentencesdiscussion forumsconstituents--
22
MoNoise (MNo)van der Goot and van Noord (2018)
https://bitbucket.org/robvanderg/normpar/src/master/
English632 tweetsTwitterdependenciesyes
23
STBWang et al. 2017
https://github.com/wanghm92/Sing_Par/tree/master/ACL17_dataset
Singaporean English1200 sentencesdiscussion forumsdependenciesyes
24
Chinese Weibo Treebank (CWT)Wang et al. 2014
https://github.com/hankcs/multi-criteria-cws/tree/master/data/other/wtb
Chinese1000 tweets and postsTwitter, Sina Weibodependenciesno
25
GUMZeldes 2017
https://corpling.uis.georgetown.edu/gum/
English
148 texts / 6856 sentences
Wikinews/interviews, Wikivoyage, wikiHow, Wikipedia bios, Reddit, Creative Commons fiction, academic papers
dependencies and constituents
yes
26
HSEn.a.
https://github.com/UniversalDependencies/UD_Belarusian-HSE/tree/dev
Belarusian25231 sentencesfiction, news, Telegram channels, Wikipediadependenciesyes
27
OODn.a.
https://github.com/UniversalDependencies/UD_Finnish-OOD/tree/master
Finnish2122 sentences
hospital patient records, discussion forums, tweets, general web crawls, poetry
dependenciesyes
28
TwittIrish (TwIr)n.a.
https://github.com/UniversalDependencies/UD_Irish-TwittIrish/tree/master
Irish866 tweetsTwitterdependenciesyes
29
Cadhan (Cdh)n.a.
https://github.com/UniversalDependencies/UD_Manx-Cadhan
Manx2319 sentences
Wikipedia, news stories from Manx Radio, blog posts, translations of literature, other
dependenciesyes
30
Taigan.a.
https://github.com/UniversalDependencies/UD_Russian-Taiga
Russian
17870 sentences (including non-segmented tweets)
UGC part: vk.com, Instagram, Facebook, Twitter, Youtube comments, questions & answers (otvet.mail.ru), reviews (reviews.yandex.ru)
dependenciesyes
31
IUn.a.
https://github.com/UniversalDependencies/UD_Ukrainian-IU
Ukrainian7060 sentences
fiction, news, opinion articles, Wikipedia, legal documents, letters, posts, and comments
dependenciesyes
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100