Estimating the vocabulary sizes of English learners around the world

Myq Larson

2019-06-20 Thu 10:30

myq@my.vocabularysize.com

Vocabulary

carr_1934.png

— Carr (1934) Vocabulary Density in High School Latin,
p. 323

meara_1980.png

— Meara (1980) Vocabulary Acquisition: a neglected aspect of language learning, p. 221

Vocabulary Size

What does it mean to know a word

Users

Learners

Find your vocabulary size

Teachers

Teacher account

Vocabulary Size Norms

So far:

  • 542,600 public results
  • 222 locations
    • 1 from Cook Islands
      • population \(\approx 18,000\)
  • teacher sessions
    • 1,200 accounts
    • 3,500 session
    • 42,000 results

English speakers

vs_eng.png

Korean speakers

vs_kor.png

German speakers

vs_deu.png

Spanish speakers

vs_spa.png

Indonesian speakers

vs_ind.png

Vietnamese speakers

vs_vie.png

Arabic speakers

vs_ara.png

Japanese speakers

vs_jpn.png

Chinese speakers

vs_han.png

Russian speakers

vs_rus.png

Technology Stack

Languages

  • PHP custom MVC framework
  • MySQL with custom ORM
    • \(\approx5\)GB of results
  • JavaScript
    • heavy reliance on jQuery

Hosting

  • shared hosting account
  • 92MB RAM limit(!)
  • \(\approx1\)GB storage (mostly logs)
  • Apache server

CDN

  • poor man's CDN
    • mostly dynamic content: my.vocabularysize.com
    • static content: vocabularysize.net
    • probably not necessary now with HTTP/2
  • CloudFlare for static content
    • bandwidth per month: \(\approx3\)GB
    • requests: \(\approx5\times10^{5}\)
    • visitors: \(\approx2.5\times10^{4}\)

Traffic

analytics.png

Cost

shared hosting \(\approx$7\) month\(^{-1}\)
domains (\(\approx 10\)) \(\approx$100\) year\(^{-1}\)
Let's Encrypt SSL certificates \($0\)
CloudFlare CDN \($0\)
total \(\approx$200\) year\(^{-1}\)

Challenges

Hackers Gonna Hack

— Battat (2016) Solving the "English Vocabulary Size" Test

Solution

Form data:

identifier:  639887
question_id: 57
_nonce:      fdf15ddaf33a5402d54fc3a6c220f0a985f9e503
selected:    1
time_diff:   1579
submitted:   1

Limited Resources

  • shared hosting
  • 92MB RAM

Solution: doNext() (dumb, but works)

public function doNext () {
    // validate and store response
    if ($this->inputValid('test_' . $this->session->Test->getSchema())) {
        $status = $this->session->Test->storeSubmittedQuestion($this->_sanitized);
        if (true === $status) {
            $this->session->Test->loadNextQuestionSet();
        }
    }
    // get next questions
    $question_set = $this->session->Test->getCurrentQuestionSet();
    // run out of questions, clean up and close session.
    if (null === $question_set) {
        return $this->_completetTest(true);
    }
    // render each question
    foreach ( $question_set as $id => $question ) {
        $question->template = $this->_buildTemplatePath($question->template);
        $this->html[$id]    = Template::getInstance()
                            ->generate($question->template, $this->defaultData());
    }
    return $this->respond();
}

Group Administration

  • order of questions randomized (sort of)

random.png

PII

  • by default, tests collect generalized demographic data
    • birth month and year
    • gender (male, female, other)
    • native language
    • time spent learning English
    • time spent living in an English-speaking country

How to identify students?

  • teachers can add PII questions to their test sessions
  • these data are only available to the session creator
  • when a session is deleted, the PII is also deleted

External Resources

median_correct_decision_time.png

Automatic translation

  • no translate

but…

Why Test

— Godwin-Jones (2019) IALLT January Webinar 2019: Vocab Learning and Beyond-Using Quizlet in the FL Classroom

Use of vocabulary size

blog1.png

blog2.png

The only problem: It doesn't work.

cummulative_vs_and_text_coverage.png

vs_and_text_coverage_overlap.png

Summary

  • learn more words
  • when studying a foreign language, learn even more words
  • measure your progress every year or two
  • don't cheat when measuring your progress
  • don't stop until you die