10 May 2013









The Background

Language testing as a methodology for inquiring and investigating language ability comes from a long and honorable tradition of practical teaching and learning needs. Being central to language teaching, it provides goals for language teaching and it monitors for both teachers’ and learners’ success in reaching these goals. Language testing also provides a methodology for experiment and investigating both language teaching and language learning (Behrahi).

The content of language testing is test, assessment and evaluation. A test is an instrument or procedure designed to elicit performance from learners with the purpose of measuring their attainment of specified criteria. Assessment and evaluation are used the teacher based analysis will develop the student skill.

A test used to measure the success of a teaching and learning process and also the educational process. A test should be made under the provisions or certain principles in accordance with the treatment given to the object, so that the teaching and learning process can be achieved to success.

Student an English Education Department should know how to make and composing a good test, to measure the extent to which effectiveness of the test used and measuring the success of the process of teaching and learning and the educational system. So, the writer interesting to take the paper entitled “Language Testing “.


The Objective of this paper

The objective of this paper is to explain the definition of a test, assessment, and evaluation. To explain the types of tests based on the categories. To explain what and how to test listening and reading. To explain what and how to test speaking and writing. And explain the principles of a good test.



The Benefit of this paper

This paper can give benefit for student because the student PBI can understand about language testing, student can differentiate test, assessment and evaluation, and to the future for teacher candidate (student PBI UMY) can make a good test for their student.

The Outline of the Paper

In this paper the writer will discuss about language testing. The first, explain the definition of a test, assessment and evaluation. Second, explain the types of tests based on the categories. The third, explain what and how to test listening, reading, speaking and writing. The last, explain the principles of good test.



The definition of test, assessment and evaluation

  1. Test

Test is a matter of concern to all teachers-whether they are in the classroom or engaged in syllabus/materials, administration or research (Suryanto, 2013). So, according to Suryanto (2013) definition of Language test is an assessment to know how far students are able to understand the language learned. Make a test have a procedure for a good testing, and the procedure not easy.

  1. Assessment

Assessment is a process by which information is obtained relative to some known objective or goal. Assessment is a broad term that includes testing. A test is a special form of assessment. Tests are assessments made under contrived circumstances especially so that they may be administered. In other words, all tests are assessments, but not all assessments are tests (faiq (2013) quoting Kizlik, Bob (2009)). The central purpose of assessment is to provide information on student achievement and progress and set the direction for ongoing teaching and learning. So, According to Suryanto (2013) Assessment is a broader in scope and involves gathering information over a period of time. This information might include formal test, classroom observation, student self assessments, of from other data sources.

  1. Evaluation

Evaluation uses methods and measures to judge student learning and understanding of the material for purposes of grading and reporting. Evaluation is feedback from the instructor to the student about the student’s learning. And also definition about evaluation, “Evaluation applies assessment data that have been scored and analyzed to make judgments, or draw inferences about students and educational programs” (Suryanto, 2013).


Types of tests based on the categories

  1. When to test?
  • Summative vs formative

Summative assessment is used primarily to make decisions for grading or determine readiness for progression. Typically summative assessment occurs at the end of an educational activity and is designed to judge the learner’s overall performance. In addition to providing the basis for grade assignment, summative assessment is used to communicate students’ abilities. For example in Indonesia is national examination and semester final exam.

Formative assessment is designed to assist the learning process by providing feedback to the learner, which can be used to identify strengths and weakness and hence improve future performance. Formative assessment is most appropriate where the results are to be used internally by those involved in the learning process (students, teachers, curriculum developers).


  • Pre – test vs post – test

Pre – test is tests conducted before the end of the learning process.

Post – test is tests conducted after the end of the learning process.


  • Placement

Placement test is designed to give students and teachers of English a quick way of assessing the approximate level of a student’s knowledge of English grammar and usage.


  • Aptitude

Measure student is probable performance. Reference forward but can be distinguished from proficiency tests. Aptitude tests assess proficiency in language for language use (e.g. will S experience difficulty in identifying sounds or the grammatical structure of a new language?) while Proficiency tests measure adequacy of control in L2 for studying other things through the medium of that language.



  • Progress

Most classroom tests take this form. Assess progress students make in mastering material taught in the classroom. Often give to motivate students. They also enable students to assess the degree of success of teaching and learning and to identify areas of weakness & difficulty. Progress tests can also be diagnostic to some degree.


  • Achievement

An achievement test is related to directly to classroom lessons, units, or even a total curriculum. Achievements tests are limited to particular material covered in a curriculum within a particular time frame, and are offered after a course has covered the objectives in question. It is to determine acquisition of course objectives at the end of a period of instruction.


  1. How to score?
  • Ø Objective vs subjective

Objective assessment is a form of questioning which has a single correct answer. It is also known as Selected-Response Items. For example is multiple-choice.

Subjective assessment is a form of questioning which may have more than one correct answer (or more than one way of expressing the correct answer). It is known as Constructed-Response Items.


  1. How real is the test situation?
  • Ø Direct vs indirect

A test is said to be direct when the test actually requires the candidate to demonstrate ability in the skill being sampled. It is a performance test. For example, if we wanted to find out if someone could drive a car, we would test this most effectively by actually asking him to drive the car.

An indirect test measures the ability or knowledge that underlies the skill we are trying to sample in our test. An example from language learning might be to test the learners’ pronunciation ability by asking them to match words that rhymed with each other.



  1. How and what are the students score compared to?
  • Norm – referenced vs criterion – referenced

Norm-referenced tests compare an examinee’s performance to that of other examinees. The goal is to rank the set of examinees so that decisions about their opportunity for success (e.g. college entrance) can be made.

Criterion-referenced tests differ in that each examinee’s performance is compared to a pre-defined set of criteria or a standard. The goal with these tests is to determine whether or not the candidate has the demonstrated mastery of a certain skill or set of skills.


  1. How much to test? One at a time or together?
  • Discrete – point vs integrative test

Discrete-point testing assumes that language knowledge can be divided into a number of independent facts: elements of grammar, vocabulary, spelling and punctuation, pronunciation, intonation and stress. These can be tested by pure items (usually multiple-choice recognition tasks). Discrete-point testing risks ignoring the systematic relationship between language elements; integrative testing risks ignoring accuracy of linguistic detail.

Integrative testing argues that any realistic language use requires the coordination of many kinds of knowledge in one linguistic event, and so uses items which combine those kinds of knowledge, like comprehension tasks, dictation, speaking and listening.

What and how to test listening and reading?

  1. Testing of listening skill

In the testing of listening the teachers usually use the Barrett’s Taxonomy. Listening skill consists of test sub-skill and test level. The test sub-skill are: identify main facts and details, relate cause and effect, identify sequent of events, predicting outcome, and inferring meaning from contextual clues. While the test levels are: literal comprehension, reorganization, inferential, comprehension, evaluation, and appreciation.

There are criteria for selection of listening texts: language consisting of lexical complexity, semantic complexity and syntactical complexity. The next criteria are accessibility, text length, authenticity, audio quality, and ideas focus of student schema, experiences, cultures, and language proficient. The last criterion is exploitability consisting of adaptability and simplification. The sources of listening text are announcement, interviews, report, stories, lectures, dialogues, poems, play, songs, advertisement, speeches, talks, news, this all can be use teacher to test listening skill of student. Moreover ,the teacher can be test the student use listening text types (genre) such as descriptive, narrative, expository, discussion, speech, talk, interview, poem, play, announcement, report, and dialogue or conversation.


There are principles in testing listening skills:

  • Allow reading time for candidates to read every question prior to the listening.
  • The writing of responses should be kept to minimum.
  • Ensure listening content and contexts be as authentic as possible.
  • Ensure test venue has excellent acoustic facility.
  • Ensure clear and easy to follow test rubric.
  • Ensure both listening tape and cassette players are in excellent working condition.

The question types of test listening skill are true / false, sequencing, matching, labeling, fill in the blanks / completion item, and MCQ consist of whit pictures and without pictures.




  1. Testing of reading skill

In testing of reading skills, the teacher have center of attention on sub-skill and reading levels. In the sub-skill consist of identify main facts and details, relate cause and effect, identify sequent of events, predicting outcome, and inferring meaning from contextual clues. And the reading levels are literal comprehension, reorganization, inferential comprehension, evaluation and appreciation. There are criteria of reading texts: exploitability (adaptability and simplification), language (lexical complexity, semantical complexity, and syntactical complexity), accessibility, text length, authenticity, and ideas (students’ schema and complexity). In testing of reading skills the teacher can use many sources of reading text such as theater, comics, reports, story books, reference books, internets, text books, traveling agencies, restaurants, manuals, and mass media (newspaper, magazine, radio or TV).

In addition, descriptive, poem, play, table, graph, charts, expository, report, and narrative text can be used to test the reading text types (genre). And the last that explains in testing of reading skills is reading test question types. The reading test question types consist of true or false, rearrangement, open-ended (subjective response, free writing), MCQ (WH question, matching, completion), structure (controlled, guided), and cloze procedure.



In order to track how your students are processing information in their reading activities, there is a scale of comprehension called Barrett’s Taxonomy. By using of Barrett’s scale of comprehension below, you can ensure a balance between all 5 levels.

  1. literal comprehension

Students identify information directly stated. This is reading level lowest.

  1. reorganization

Students organize or order the information in a different way than it was presented.

  1. inferential comprehension

Students respond to information implied but not directly stated.

  1. Evaluation

Students make judgments in light of the material.

  1. Appreciation
    Students give an emotional or image-based response. This is reading level highest.


What and how to test speaking and writing?

  1. Testing of speaking skills

In testing of speaking skill the teacher can use the Bloom’s taxonomy. There are some components that need to test in speaking skills: language consist of vocabulary, syntax, grammar and sentence complexity, organization consist of appropriacy and format, pronunciation consist of accuracy and clarity, content / ideas consist of clarity, quality and quantity, turn-taking and fluency, confidence, eye-contact and style.


  1. Testing of writing skills

Bloom’s taxonomy is usually used for the basis of testing writing. Some skill should be measured testing of writing skills are organization (appropriacy and paragraphing), content/ideas (quantity, quality/level and clarity) and language (vocabulary, sentence complexity, syntax and accuracy).



Benjamin Bloom created this taxonomy for categorizing level of abstraction of questions that commonly occur in educational settings. The taxonomy provides a useful structure in which to categorize test questions, since professors will characteristically ask questions within particular levels, and if you can determine the levels of questions that will appear on your exams, you will be able to study using appropriate strategies. the categories is knowledge, comprehension, application, analysis, synthesis and evaluation.

The principles of a good test

A test is called good test, if it fulfills the standard criteria. These standard criteria are validity, reliability, discrimination and practicality.

  1. Validity

Validity explains about whether a test measures what it is supposed to measure and nothing else. Type of validity:

  1. Face validity

Face validity refers to whether a test looks all right to you, and other test-takers. Face validity can be known through do try out or test formal and get opinion from the colleagues. So, the teacher can to eliminate any inappropriateness of items.

  1. Content validity

Content validity whether the test is compatible with the purpose of the course or lesson and content. The test areas must be based on the curriculum or syllabus.

  1. Construct validity

This tells you whether the format, the content and type of items used all matches the theoretical assumption, approach and method which used in teaching and learning process. For example when using communicative approach, the test must use communicative way.

  1. Concurrent validity

Construct is the framework of a concept, construct validity is validity with regard to the ability of a measuring instrument to measure understanding of a concept is measured. For example when we take TOEFL, TOEIC, and IELTS, the result of the test must be similar level or score.

  1. Predictive validity

Predictive validity is the ability of a measurement instrument to predict exactly what will happen in the future. For example if the school has a validity test predictions or not is determined by the fact whether there is a significant correlation between the results of the entrance test after a student achievement, if any, means that the test has predictive validity.


  1. Reliability

Reliability as a measurement tool and the results are used to make many important decisions. A test is said to be the best reliability in a score that is generated has a consistent measurement results, not changeable, and can be trusted. Reliability is also said to decide validity, if a test is not reliable means that invalid test ( Lionova, 2013 quoting Fernandes,1984: 43). Lionova quoting  Ebel (1986: 223) suggests that a test could not be said to be good if it does not indicate the quality of reliability. Therefore, the higher the reliability of a test, then the good quality of the tests. And if it is connected to the validity, reliability and validity of the statutes is the accuracy of the


  1. Discrimination

Discrimination used to distinguish level students ability. Language test have discrimination and we can find out how well a test discriminates by calculating its discrimination index.

  1. Practicality

“Practicality is the relationship between the resources that will be required in design, development, and use of the test and the resources that will be available for these activities” ( quoting Bachman and Palmer, 1996:36). They illustrated that this quality is unlike the others because it focuses on how the test is conducted. Moreover, there are four types of practicality: time, cost, administration and personnel.

This is a measure of:

  1. How easy it is to administer and score the test;
  2. How clear the instructions to the student(s) and rater(s) are;
  3. How easy it is to interpret the result;
  4. How economical (in term of money, time and equipment) it is to administer the test



Language testing is the test to know how far students be able to understand the language learner or the material. A test is a method of measuring a person’s skill, ability or knowledge in a given area. Assessment is a process of gathering, analysis, interpreting about student learning. And evaluation is process analyzing, summarized of assessment information and makes a judgment about student.

A test must be in accordance with what is to be measured so as to not give the correct information. In other words, a test is a tool used to determine the achievement of desired state by the tester, after first providing the correct treatment of the object in the test. A test should be made under the provisions or certain principles in accordance with the treatment given to the object, so that the resulting information can be trusted. A test can be said to be good if it satisfies the four factors namely: validity, reliability, discrimination and practicality.

In the listening and reading tests, there are several levels that should be tested as sub-skill and levels. Sub-skill and level will determine the level of difficulty on tests of listening and reading. Barret’s taxonomy is used to test the listening and reading. While Bloom’s taxonomy is used to test speaking and writing. In the speaking and writing test there are component to tests like content or ideas, language, and organization.






No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Skip to toolbar