Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
Jura1
.
runtime > 5 min
Toggle Highlighting
SQL
# Find missing given names # # Uses items linking P27, but not to P735 # The "given name" is the first part of the label. This can be a given name, but not necessarily. # # For project, see https://www.wikidata.org/wiki/Wikidata:WikiProject_Names # # general database scheme https://upload.wikimedia.org/wikipedia/commons/f/f7/MediaWiki_1.24.1_database_schema.svg # doesn't include Wikidata tables :( # use wikidatawiki_p; SELECT SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1) AS label, COUNT(*) As freq, ROUND(COUNT(*)*2.5,-1) As freqEst, term_language As lang, CONCAT( '#[[Special:Search/',SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1), '|', SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1), ']]: ', COUNT(*) ) As simplelist, CURRENT_DATE FROM pagelinks As pl1, wb_terms, wb_entity_per_page LEFT JOIN ( SELECT pl_from FROM pagelinks WHERE pl_title = 'P735' AND pl_namespace = 120 ) As allitemswithP735 ON epp_page_id = allitemswithP735.pl_from WHERE allitemswithP735.pl_from IS NULL AND pl1.pl_title = 'P27' AND pl1.pl_namespace = 120 # 0 for items, 120 for properties AND pl1.pl_from = epp_page_id AND epp_entity_type = 'item' AND term_entity_id = epp_entity_id AND term_type = 'label' AND term_entity_type = 'item' AND term_language = 'en' # Big wikis: "en", "cs", "da", "de", "es", "et", "fi", "fr", "hu", "it", "nl", "pl", "pt", "sk", "sv", "tr" #Filter some false positives: #Two-letter names/abbr.: DJ, Li, Wu, Yu, El, Oh, Ma AND term_text not RLike '^.. ' # Asian family names, etc # AND term_text not RLike '^(Kim|Lee) ' AND term_text Not Like 'Kim %' AND term_text Not Like 'Lee %' AND term_text Not Like 'Ahn %' AND term_text Not Like 'Chan %' AND term_text Not Like 'Chen %' AND term_text Not Like 'Chung %' AND term_text Not Like 'Cho %' AND term_text Not Like 'Choi %' AND term_text Not Like 'Han %' AND term_text Not Like 'Hang %' AND term_text Not Like 'Huang %' AND term_text Not Like 'Hong %' AND term_text Not Like 'Hwang %' AND term_text Not Like 'Jang %' AND term_text Not Like 'Jin %' AND term_text Not Like 'Jung %' AND term_text Not Like 'Kang %' AND term_text Not Like 'Len %' AND term_text Not Like 'Lim %' AND term_text Not Like 'Lin %' AND term_text Not Like 'Liu %' AND term_text Not Like 'Moon %' AND term_text Not Like 'Park %' AND term_text Not Like 'Rui %' AND term_text Not Like 'Seo %' AND term_text Not Like 'Shin %' AND term_text Not Like 'Song %' AND term_text Not Like 'Sun %' AND term_text Not Like 'Wang %' AND term_text Not Like 'Yang %' AND term_text Not Like 'Yoo %' AND term_text Not Like 'Yoon %' AND term_text Not Like 'Zhang %' AND term_text Not Like 'Zhao %' AND term_text Not Like 'Zhou %' # prefix AND term_text Not Like 'The %' AND term_text Not Like 'Big %' AND term_text Not Like 'Sir %' AND term_text Not Like 'Mr. %' AND term_text Not Like 'Ibn %' AND term_text Not Like 'Abu %' #nl AND term_text Not Like 'Master %' AND term_text Not Like 'MaƮtre %' AND term_text Not Like 'Prince %' AND term_text Not Like 'Princess %' AND term_text Not Like 'Junior %' AND term_text Not Like 'King %' AND term_text Not Like 'Lady %' AND term_text Not Like 'Little %' AND term_text Not Like 'Saint %' AND term_text Not Like 'Emperador %' # es GROUP BY SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1) # filter less frequent ones: HAVING COUNT(*)>74
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...