Hanya Blog UMY situs lain


Posted by Ummu Muti'ah 4 Comments



 Testing, assessment, and evaluation are part of life in this modern era. A lot of school in the world are constantly assessed, whether to observe their educational development and also to evaluate the quality of systems in the school. Entrance to educational establishments, to professions and even to entire countries is sometimes controlled by tests. Tests play an essential and controversial role in allowing access to the limited resources and opportunities that our world provides. The importance of understanding what we test, how we test and the impact that the use of tests has on individuals and societies cannot be overstated. Testing is more than a technical activity; it is also an ethical enterprise. The practice of language testing draws upon, and also contributes to, all disciplines within applied linguistics. However, there is something fundamentally different about language testing.

Language testing as a methodology for inquiring and investigating language ability and learning needs. According to Surianto 2013, language testing is an assessment to know how far students are able to understand the language learner. As being central to teaching and learning language, it provides goals for language teaching and its can monitors for both teachers and learners success in reaching these goals. Language testing also provides a methodology for conduct experiment and investigating both language teaching and language learning. Spolsky (2001) states that in the course of the first 2000 years that human abilities have been assessed formally, assessment and evaluation have become progressively more powerful. Language testing is a sub-field within applied linguistics, has evolved and expanded in a number of ways in the past decades. Bachman (1999) presents a brief review of language testing at the turn of the century in the newsletter of the American Association for Applied Linguistics. He investigates language, when language testing practice, as reflected in large-scale institutional language testing and in most language testing textbooks, was informed essentially by a theoretical view of language ability as consisting of skills (listening, speaking, reading, and writing) and components (e.g., grammar, vocabulary, pronunciation). The approach taken to test design was based on the idea of testing isolated “discrete points” of language, while the primary concern was psychometric reliability. Language testing research was dominated largely by the hypothesis that language proficiency consisted of a single trait, and required a quantitative statistical research methodology.

The parts of language testing are test, assessment and evaluation that are significant components in teaching and learning English language. Devoid of an effective evaluation program it is not possible to know whether students have learned, whether teaching has been effective, or how best student to learning needs. The quality of the test, assessment and evaluation in the educational process has an unfathomable and predetermined relation to student performance.

The purpose of writing this paper is to explain about the definition of a test, assessment, and evaluation, to explain the types of tests based on the categories, to explain what and how to test listening and reading, to explain what and how to test speaking and writing and the last to explain about the principles of a good test. This paper can help the students and the teachers to understand about the language testing.

The writer will try to explain about the definition of a test, assessments, and evaluation. Second is explains about the types of tests based on the categories. The third explain about what and how to test listening and reading, speaking and writing. Then explanation about the principles of a good test and the last is conclusion of this paper.

The definition of test, assessment, and evaluation

A.    Test

Test is a matter of concern to all teachers-whether they are in the classroom or engaged in syllabus or materials, administration or research (Suryanto, 2013). According to Carroll 1968,’a psychological or educational test is a procedure design to elicit certain behavior from which one can make inferences about certain characteristic of an individual’. Therefore, a test is meeting point, considered to measure a set of skills at a certain time. Spolsky (2001) He further mentions five major purposes for test using as follows: the first, using tests as a competitive selection device,  second is using tests in order to provide information on the quality of the “product” to those who are paying for an education system, the third is using tests to process and certify that an individual has achieved a specific level of technical or professional skill, then using tests for prediction or prognosis of the probable results of training and the last is using tests as an integral part of all good teaching.

B.     Assessment

Assessment is the systematic collection, review and use of information about educational programs to improve student learning. Gipps and Lynch (2005) they said that an assessment is a Judging the ability of learner based on a test or otherwise and using the judgment as constructive element in learning over time. Assessment focuses on what students know, what they are able to do, and what values they have when they graduate. Assessment is concerned with the collective impact of a program on student learning and Assessment is provides feedback on knowledge, skills, attitudes, and work products for the purpose of elevating future performances and learning outcomes. Then, according to Suryanto (2013), Assessment is a broader in scope and involves gathering information over a period of time. This information might include formal test, classroom observation, student self assessments, of from other data sources.

Assessment provides information on whether in the teaching and the learning has been successful. However the information it provides has a number of potential different audiences whose precise requirements may vary. Classroom teachers need regular information on how pupils’ knowledge, skills and understanding are developing, both to inform how they should adjust their teaching and to determine what kind of feedback is needed to improve pupils’ learning. On the other hand, school principals and policy makers need additional, broader information on the quality of education in a school or country. The sort of comparative data required for this purpose needs a high level of reliability and uniformity. In the case of language as school subject this requirement is challenging because it is difficult to create tests which are manageable but at the same time faithful to the aims of the subject. Employers and society at large also need reliable information which can help certify achievement and provide a basis for selection. Parents too require information which can help them understand their children’s achievements. Learners also need to know how they are progressing and how to improve their performance but they many need to be protected from the potentially of effects in negative assessment. A starting point for resolving tensions related to matters of assessment is to develop understanding of other points of view. A key challenge is to develop a system of assessment that acknowledges the different functions of assessment and it helps to see these as complementary rather than being in opposition to each other.

C.     Evaluation

Andrews and Werner (1988) provide a fairly comprehensive definition of evaluation.  According to them, “to evaluate is to make an explicit judgment about the worth of all or part of a program by collecting evidence to determine if acceptable standards have been met.” This definition of evaluation has two key terms: Standards are ideals or desired qualities or conditions against which actual objectives are to be measured.  Evidence is information necessary to help us confirm whether or not the required standards have been met by the program.  For example, adoption of the no till practices in a watershed is the standard and percent of farmers adopting the no till practice within the first five years of the project is the evidence and standards or desired qualities or conditions against which program outcomes are measured come straight from the written goals and objectives of the program.

There are some types of tests based on the categories:

1.      When to test?

   A. Summative Vs formative

Summative test is Summative test is the attempt to summarize students’ learning at some point in time, in the end of the course. The sample of summative vs formative is UN (Ujian Nasional). The purpose of this test is to know the ability of students. Meanwhile the formative of test is the test that held to know whether or not the program is run a way. It gives to the students or participants in order to plan what is should to do after doing the test and the sample of formatives test are daily test after the material and mid semester test.

     B. Pre-test Vs post-test

      Pre-test is test before the end of learning and teaching activity, the purpose of this test is preliminary test administered to determine a student’s’ baseline knowledge or preparedness for an educational experience or course of the study. Then post-test is test after the end of learning and teaching activity.

C.     Placement

The aim of a placement test is to help sort new students into teaching groups of roughly the same level. As they are not related to any particular course taken these tests often start simply and get more difficult to cater for a range of abilities. It can identify a particular performance level of the students and to place the test-takers at an appropriate level of instruction.

D.    Aptitude

     Aptitude tests are similar to skill and intelligence tests, and are used to determine an individual’s capability in performing particular tasks. Aptitude tests frequently consist of items that are intended to evaluate the taker’s special abilities inside a designated area. There are many kinds of aptitude tests that seek to gauge one’s capacity in a certain area, such as verbal, numerical, clerical, sensory, spatial or mechanical, and logic and reasoning skills.

E.     Progress

A progress test will basically display the activities based on the material the teacher is determined to check. To evaluate it the teacher can work out a certain system of points that later will compose a mark. Typically, such tests do not influence the students’ final mark at the end of the year.

F.      Achievement

Achievement tests are meant to check the mastery of the material covered by the learners.The test is based on a syllabus studied or a book taken during the course. This test could be described as a fair test, for it focuses mainly on the detailed material that the students are supposed to have studied or Is a test which is giving to know how much the material that the students’ have already achieved after joining the learning process/instruction.

1.      How to score?

   Subjective Vs Objective

     Subjective test in which the impression or opinion of the scorer determine the score or evaluation of performance. While a test for which the scoring procedure is completely specified enabling agreement among different scorers. It means that we can make answer key in advance and the correct answer must be the same with the one in answer key.

2.      How much to test?

      Discrete-point Vs Integrative, According to Longman Dictionary, discrete point test is a language test that is meant to test a particular language item, e.g. tenses. The basis of that type of tests is that it’s can test components of the language (grammar, vocabulary, pronunciation, and spelling) and language skills (listening, reading, speaking, and writing) separately. Discrete -point test is a common test used by the teachers in our schools. for example having studied a grammar topic or new vocabulary, having practiced it a great deal, the teacher basically gives a test based on the covered material. This test usually includes the items that were studied and will never display anything else from a far different field. According to Longman Dictionary, the integrative test intends to check several language skills and language components together or simultaneously. Hughes (1989:15) stipulates that the integrative tests display the learners’ knowledge of grammar, vocabulary, spelling together, but not as separate skills or items. The teacher should incorporate both types of testing for effective evaluation of the students’ true language abilities.

3.      How real is the test situation?

        Direct Vs Indirect, direct testing is that it is intended to test some certain abilities, and preparation for that usually involves persistent practice of certain skills. Indirect testing, tests the usage of the language in real-life situation. Moreover, it suits all situations; whereas direct testing is bound to certain tasks intended to check a certain skill. Hughes (ibid.) assumes that indirect testing is more effective than direct one, for it covers a broader part of the language. It denotes that the learners are not constrained to one particular skill and a relevant exercise. They are free to elaborate all four skills; what is checked is their ability to operate with those skills and apply them in various, even unpredictable situations. This is the true indicator of the learner’s real knowledge of the language.

4.         How and what are the students’ score compared to?

        Norm-referenced Vs Criterion-referenced,

      Norm-referenced test measures that the knowledge of the learner and compares it with the knowledge of another member of his or her group. The learner’s score is compared with the scores of the other students or a test that compares the result of the test among the population. It has no criteria and the cutting score is not clear, for example test includes the UMPTN test. Then, criterion-referenced test measures the knowledge of the students according to set standards or criteria. This means that there will be certain criteria according to which the students will be assessed. There will be various criteria for different levels of the students’ language knowledge. Here the aim of testing is not to compare the results of the students. It is connected with the learners’ knowledge of the subject. The example is TOEFL test.

What and how to test listening and reading (Barret’s Taxsonomi)?

Language testing has skill and the sub-skills. The most popular is Barret Taxsonomi. It’s the best basis for testing listening and reading. In The levels in listening and reading consist of literal comprehension, reorganization, inferential, comprehension, evaluation and the appreciation.  The first is Appreciation (Highest) Students give an emotional or image-based response and the student get Critique, appraise, comment and appreciate. The second is evaluation, in the evaluation the teacher give Analyze, appraise, evaluate, justify, reason, critics and judge for the students or Evaluation Students make judgments in light of the material. The third, in Inference students respond to information implied but not directly stated like predict, infer and guess. Then, Reorganization (Classify, regroup, rearrange, assemble, collect and categorize) it’s mean that Students organize or order the information in a different way than it was presented and the last is literal comprehension (Lowest). The parts are Label, list, name, relate, recall, repeat and state.  Students identify information directly stated.

a.          Testing of listening skills

Listening skills consist of sub skills and levels. For the first one is test about the students sub skills that are about the identify main facts and details, relate cause and effect, identify sequence of events, predicting outcomes, and inferring meaning from contextual clues. There are criteria for selection of listening texts: language consisting of lexical complexity, semantic complexity and syntactical complexity. The next criteria are accessibility, text length, authenticity, audio quality, and ideas focus of student schema, experiences, cultures, and language proficient. The last criterion is exploitability consisting of adaptability and simplification. The sources of listening text are announcement, interviews, report, stories, lectures, dialogues, poems, play, songs, advertisement, speeches, talks, news, this all can be use teacher to test listening skill of student. Moreover ,the teacher can be test the student use listening text types (genre) such as descriptive, narrative, expository, discussion, speech, talk, interview, poem, and etc.

b.      Testing Reading Skill

Testing of reading skills has several criterions for selection of reading text. The criterions for selection of reading text are: idea, exploitability, language, accessibility and text length. If we will search the sources of reading text, we can search in journals, mass media, comics and storybooks. Beside the sources in above, we can find out the source of reading text in internet, reports, manuals and many others. The last of reading skills the written will explain about types of reading test question. The type of reading test question are: true or false, rearrangement, structured (controlled, guided), open-ended (subjective response, free writing), MCQ (question, matching and completion with some options) and cloze procedure. In testing of reading skills the teacher can use many sources of reading text such as theater, comics, reports, story books, reference books, internets, text books, traveling agencies, restaurants, manuals, and mass media like newspaper, magazine, radio or TV. In addition, descriptive, poem, play, table, graph, charts, expository, report, and narrative text can be used to test the reading text types or genre.

What and how to test speaking and writing?

        Speaking and writing tests using Bloom’s Taxonomy model. It’s created by Benjamin Bloom; The Bloom’s taxonomy is very useful for structure in which to categorize test questions. This is a chart of Bloom’s taxonomy.

 Arrange, define, label,list, memorize name, relate, recall, repeat, state

Classify, describe, discuss, explain, express, identify, indicate, locate, recognize, report.


Apply, choose, demonstrate, illustrate, interpret, operate, solve, use, employ.


Analyze, appraise, calculate, categorize, contrast, criticize, differentiate, and distinguish.

Arrange, assemble, collect, compose, plan, construct, create, design, develop, purpose.


Appraise, argue, assess, justify, judge, rate, support, value, evaluate.

a.       Testing of speaking skills

        The test of speaking skill the teacher is able to use the Bloom’s taxonomy. There are a number of components that need to test in speaking skills, language consist of vocabulary, syntax, grammar and sentence complexity, organization consist of appropriately and format, pronunciation consist of accuracy and clarity, content / ideas consist of clarity, quality and quantity, turn-taking and fluency, confidence, eye-contact and style. The example of test to speaking skill is speech, talk, sing and etc.

b.      Testing of writing skills

         The last is test of writing skill. The several of skill should be measured testing of writing skills are organization (appropriately and paragraphing), content/ideas (quantity, quality/level and clarity) and language using vocabulary, sentence complexity, syntax and accuracy). One of the samples testing of writing skill is make paper.

The Principle of Good Test

        There are many principle of good test, for the first is reliability, validity, practically and the last is discrimination.

·         Reliability

            Reliability is meant the stability of test scores. A test cannot measure anything well unless it measures consistently. Two somewhat different types of consistency or reliability are involved: reliability of the test itself, and reliability of the scoring of the test. Test reliability may be estimated in a number of ways, they are retest the same individuals with the same test, use of alternate or parallel forms-that is, with different versions, and giving a single administration of one form of the test and then.  Finally, it must always be remembered that reliability refers purely and simply to the precision with which the test measures. No matter how high the reliability quotient, it is by no means a guarantee that the test measures what the test user wants to measure. Data concerning what the test measures must be sought from some source outside the test itself. This problem will be considered in the following section.

·         Validity

Empirical validity is of two general kinds, predictive and concurrent (or status) validity, depending on whether test scores are correlated with subsequent or concurrent criterion measures. According to Bynom (Forum, 2001), validity deals with what is tested and degree to which a test measures what is supposed to measure (Longman Dictionary, LTAL). For example, if we test the students writing skills giving them a composition test on Ways of Cooking, we cannot denote such test as valid, for it can be argued that it tests not our abilities to write, but the knowledge of cooking as a skill. Publishers of standard test should be expected to provide evidence of the validity of their measures. Reliability as a measurement tool and the results are used to make important decisions. Many people said that the best reliability in a score that is generated has a consistent measurement results, not changeable, and can be trusted. Lionova quoting  Ebel (1986: 223) suggests that a test could not be said to be good if it does not indicate the quality of reliability.

·         Practicality

    In writing or selecting a test, we should certainly pay some attention to how long the administering and scoring of it will take. Our task, then, is to select an instrument which is of sufficient length to yield dependable and meaningful results but which will also fit comfortably into the time that can be made available for testing. However, we need to have some general guidance as to the meaning of test scores to begin with, for without this it is extremely difficult to use an instrument in an efficient manner.

·         Discrimination

It’s used to distinguish level student’s ability. Language test have discrimination and we can find out how well a test discriminates by calculating its discrimination index



















Language testing is a procedure for relationship and also a language test is a procedure for gathering evidence of general or specific language abilities from performance on tasks designed to provide a basis for predictions about an individual’s use of those abilities in real world contexts. The importance of language testing and the evolution of modern linguistics have made teachers and testers aware of the significance of a need for a comprehensive analysis of the language under consideration. Then, teachers are constantly revising their teaching strategies in the light of advances in modern linguistics and psychology using test, assessment and evaluation.

The test of language have model to design the level of difficulty of the test. Many people said that the test design consider the Bloom’s Taxonomy for speaking and writing testing and the Barret’s Taxonomy using for listening and reading test. On the other hand, testers are trying to improve their techniques to test language ability more or the principles of good test like validly, reliability practically and discrimination in compliance with advances in teaching. Certainly, not all innovations in language science have had equal or similar effects upon teaching and testing; each has paid certain attention to the relative importance of each skill or component. Thus, it is the responsibility of the teacher to choose the most appropriate method of estimating learner’s knowledge or ability, particularly where learning a second language is concerned.











Davidson, G. F. (2007). Language Testing and Assessment. New York: British Library .

Suryanto, Jati. (2013). Bloomfield and Barret’s Taxonomy as the basis for making a good language testing. University of Muhammadiyah Yogyakarta.

Banerjee, J. C. (2011). Language testing and assessment (Part I). Cambridge University Press, 34,213-236.

Gipps and Lynch. (2005). An introduction to (English) Language testing. Retrieved mei 3, 2013, From : https://www.An+introduction+to+%28English%29+Language+testing.com.

Ozerova, Anzelika. (2004). Types of Tests Used in English Language.University of Latvia: 12 May, 2004. From www.bestreferat.ru

Martyniuk, dkk.(2007). Evaluation and assessment within the domain of Language(s) of Education. Jurnal of Education : November 10, 2007    From


Spinello, Serena. (2010). The type of aptitude test.  from


Suvedi, Murari. (2011) Evaluation. from




Leila, Behrahi (2010). The history of language testing . April 21,2010. fromhttp://www.lorenglish.blogfa.com/post-5.aspx




Categories: Tak Berkategori


Ummu Muti'ah

Popular Posts



''essay ab

                              Learning and Teaching Using Facebook Learning and teaching is an ...

some methodology in


''Paper about Entrep

ENTREPRENEURSHIP                     ...

about language FUNCT

       Language is very important for everyone in the world. There ...