Microsoft Word - OPAC White Paper 2006.doc

Office Proficiency Assessment

and Certification System

OPAC

White Paper

OPAC® is a product of Biddle Consulting Group, Inc.

OPAC

® and Office Proficiency Assessment and Certification® are registered

trademarks of Biddle Consulting Group, Inc.

Preface

The following is a compilation of all major validation studies involving the Office Proficiency

Assessment and Certification System (OPAC). This compilation does not include instructions

for operating the OPAC System, and users should refer to either the OPAC Candidate

Manual or the OPAC Administrator Manual for such information. The validation studies

presented in this compilation date from 1989 to 2005 and certain older validity reports may

contain information that is no longer relevant as the OPAC System has been updated and

improved over time. Unless otherwise stated, all material presented in this compilation is

copyrighted © by Biddle Consulting Group, Inc.

OPAC was distributed by Biddle & Associates, Inc. until 2001. It is currently distributed by

Biddle Consulting Group, Inc., which was formed out of Biddle & Associates, Inc. in 2001.

Correspondence should be directed to:

Biddle Consulting Group, Inc.

Attn: James E. Kuthy, Senior Consultant

193 Blue Ravine Road, Suite 270

Folsom, CA 95630

Toll-free: 800-999-0438

Voice (916) 294-4250 · Fax (916) 294-4255

www.biddle.com

This paper was last updated in January, 2006

Table of Contents

Office Proficiency Assessment and Certification

.................................................. 9

Certification Information ...................................................................................... 10

Standards for Certification.................................................................................... 10

Procedures for Certification................................................................................... 12

OPAC Research ................................................................................................... 12

Content Validity Study ......................................................................................... 13

Literature Review................................................................................................ 13

Competency Development.................................................................................... 13

Survey Work ...................................................................................................... 14

Data Analysis and Competency Modification ............................................................ 14

Results and Reporting.......................................................................................... 14

Content Development .......................................................................................... 15

Field Test........................................................................................................... 15

Ongoing Research ............................................................................................... 15

OPAC (Entry-Level) Content Validity Study: Areas, Competencies, and Tasks............... 15

Validation Study for Secretarial/Administrative Classifications Using Computer-

Based Testing (1991).......................................................................................... 23

Abstract............................................................................................................. 24

Method .............................................................................................................. 25

Job Analysis - Part I ............................................................................................ 26

Criterion Development and Sampling ..................................................................... 26

Job Analysis - Part II, and Testing of Incumbents on Computers ................................ 26

Results .............................................................................................................. 27

Content Validity ............................................................................................... 27

Concurrent Criterion-Related Validity................................................................... 27

Alternative Procedures Investigated .................................................................... 28

Discussion.......................................................................................................... 29

Content and Concurrent Criterion-Related Validity for Some OPAC

Tests........... 31

Introduction ....................................................................................................... 31

Introduction ....................................................................................................... 32

Experimental Test Battery .................................................................................... 32

Identification of Sample ....................................................................................... 32

Job Performance Ratings ...................................................................................... 33

Data on the Experimental Tests............................................................................. 33

Job Analysis and Test Evaluation ........................................................................... 33

Content Validity Results ....................................................................................... 34

Concurrent Criterion-Related Validity Results........................................................... 35

OPAC Tests ........................................................................................................ 35

Correlations Between OPAC Test Scales and Supervisory Ratings ............................... 36

Alternative Procedure Analysis .............................................................................. 37

Overall Conclusions Considering Validity and Adverse Impact .................................... 38

Content and Criterion-Related Validity Report for the OPAC

System (1994) ...... 39

Experimental Test Battery .................................................................................... 40

Identification of Sample ....................................................................................... 40

Job Performance Ratings ...................................................................................... 41

Data on the Experimental Tests............................................................................. 41

Job Analysis and Test Evaluation ........................................................................... 42

Content Validity Results ....................................................................................... 42

Concurrent Criterion-Related Validity Results........................................................... 43

Criterion-Related Validity Correlations .................................................................... 43

Correlations Between OPAC Test Scales and Supervisory Ratings ............................... 44

Alternative Procedure Analysis .............................................................................. 46

Overall Conclusions Considering Validity and Adverse Impact .................................... 47

Content Validity Report for OPAC

Module Four (March 1997) ............................ 48

Test Description.................................................................................................. 49

Vendor Test........................................................................................................ 49

Inventory Test.................................................................................................... 49

Invoice Test ....................................................................................................... 50

Review by Biddle Consulting Group, Inc.................................................................. 50

Review by Subject-Matter Experts ......................................................................... 51

Development of Certification Levels ....................................................................... 51

Recommended Certification Levels for Three Data Entry Tests .................................. 52

Data Entry 1 – Vendor ...................................................................................... 52

Data Entry 2 – Inventory................................................................................... 52

Data Entry 3 – Invoice ...................................................................................... 52

Accuracy and Completeness.................................................................................. 53

Validation Report for the Medical and Legal Terminology Tests (August 1997) ... 54

Introduction ....................................................................................................... 55

User(s), Locations(s), and Date(s) of the Study....................................................... 55

Problem and Setting ............................................................................................ 55

Job Analysis ....................................................................................................... 57

Selection Procedure and Contents.......................................................................... 58

Relationship between the Selection Procedure and the Job ........................................ 60

Test Form A ....................................................................................................... 61

Medical Test Form B ............................................................................................ 61

Alternative procedures investigated ....................................................................... 61

Uses and applications .......................................................................................... 61

Accuracy and completeness .................................................................................. 63

Development Report for OPAC

System 5.0 Legal Keyboarding and Language Arts

Tests................................................................................................................... 64

Background........................................................................................................ 67

Early Development .............................................................................................. 67

Industry Experts ................................................................................................. 67

Test Descriptions ................................................................................................ 67

Testing Site........................................................................................................ 68

Method .............................................................................................................. 68

Participants ..................................................................................................... 68

Materials ......................................................................................................... 68

Procedure........................................................................................................ 69

Results .............................................................................................................. 69

Legal Keyboarding............................................................................................ 70

Legal Language Arts ......................................................................................... 71

Angoff Scores .................................................................................................. 71

Performance Differentiation ............................................................................... 72

Job Duty/KSA Linkage....................................................................................... 72

Discussion.......................................................................................................... 73

Development Report for OPAC

System 5.0 Medical Keyboarding and Language

Arts Tests ........................................................................................................... 74

Background........................................................................................................ 77

Early Development .............................................................................................. 77

Industry Experts ................................................................................................. 77

Test Descriptions ................................................................................................ 77

Testing Site........................................................................................................ 78

Method .............................................................................................................. 78

Participants ..................................................................................................... 78

Materials ......................................................................................................... 79

Procedure........................................................................................................ 79

Results .............................................................................................................. 80

Medical Keyboarding ......................................................................................... 80

Medical Language Arts ...................................................................................... 81

Angoff Scores .................................................................................................. 81

Performance Differentiation ............................................................................... 82

Job Duty/KSA Linkage....................................................................................... 82

Discussion.......................................................................................................... 83

Development Report for OPAC

5.3 Legal and Medical Transcription Tests .......... 84

Background........................................................................................................ 87

Early Development .............................................................................................. 87

Industry Experts ................................................................................................. 87

Test Descriptions ................................................................................................ 87

Testing Site........................................................................................................ 88

Method .............................................................................................................. 88

Legal Participants ............................................................................................. 88

Medical Participants .......................................................................................... 89

Materials ......................................................................................................... 89

Procedure .......................................................................................................... 90

Results .............................................................................................................. 90

Legal Transcription ........................................................................................... 90

Medical Transcription ........................................................................................ 91

Angoff Scores .................................................................................................. 92

Performance Differentiation ............................................................................... 93

Job Duty/KSA Linkage....................................................................................... 93

Discussion.......................................................................................................... 94

References ......................................................................................................... 95

Development Report for OPAC

PowerPoint Test (2002) ................................... 95

Development Report for OPAC

Intermediate Excel Test (2002) ....................... 99

Development Report for OPAC

Intermediate Word Test (2003) ....................... 103

Development Report for OPAC

Basic Word Test (2005) ................................... 107

Development Report for OPAC

Basic Excel Test (2005).................................... 112

Development Report for OPAC

Contemporary Keyboarding Test (2005)........... 117

Office Proficiency Assessment and

Certification

Certification Standards

Project and Development Team:

Susie H. VanHuss, Ph.D. Project Director

Richard J. Rovinelli, Ph.D. Project Systems Analyst

Carolyn S. Hayes, B.S., CPS Project Coordinator

L. Joyce Arntson, MBA Instructional Developer

Anne L. Matthews, Ed.D. Instructional Developer

Elizabeth W. Tweeten, Ph.D. Instructional Developer

International Association of Administrative Professionals®

(Formerly Professional Secretaries International®)

10502 NW Ambassador Drive

Kansas City, MO 64195-0404

816 891-6600

Certification Standards

Candidates who take all required modules and units of the OPAC program and meet the

standards specified in this section are offered certification by Professional Secretaries

international (PSI)--since 1942, the world's leading organization for office professionals.

Certification Information

Certification provides benefits to both candidates who earn the certification and

organizations that support candidates in their bid to earn certification. Candidates gain

prestige from certification by the recognized international association for the office

profession. Certification enhances personal satisfaction and builds self-confidence. It

provides an incentive to continue career development. In addition, candidates receive

objective information about their strengths and weaknesses that helps them to formulate

realistic plans for career growth. Organizations benefit from the increased professionalism

of its entry-level employees. Certification helps to establish a standard barometer for

competency within the industry and provides incentive for career growth.

Candidates who have taken all required modules and units of the OPAC program and who

have met the standards specified in the next section may apply for certification. The

application process consists of having the test administrator export the data from the hard

disk of the system to a blank floppy diskette. The diskette must be sent to the OPAC

Support Office, 410-C Veterans Road, Columbia, SC 29209 with a check made payable to

PSI for $30. Procedures for exporting data are provided in the OPAC System Installation

Manual. A form is provided to facilitate the certification process. The OPAC Support Office

verifies that the standards have been met and notifies PSI. The certification is then issued

by Professional Secretaries International. For additional information about certification,

write Professional Secretaries International, P.O. Box 20404, Kansas City, MO 641950404 or

call (816) 891-6600, Extension 238.

Standards for Certification

Candidates who wish to receive certification from Professional Secretaries

International must meet the standards specified in the next section of this manual.

Candidates may repeat those modules and units on which they did not meet certification

standards. The OPAC system stores and maintains the response data and results the first

time the candidate takes each unit. When units are repeated, the system maintains the

response data and the results of both the initial time and the most recent time they have

taken the examination. Therefore, once candidates have completed any units successfully,

they should repeat only those units in which their scores did not meet the standards

specified.

Module 1

The candidate must key at least 45 gross words per minute on the five-minute timed writing

with a maximum of 10 errors.

Module 1, Unit 2

The candidate must demonstrate the ability to use all of the following word-processing

functions:

bold block indent center

copy delete hard hyphen

hard page break hard return hard space

insert move spell check

printing underscore windows/orphans

Module 1, Unit 3

The candidate must select the appropriate paragraphs and merge them, and the letter must

be formatted correctly in the style indicated. The letter is checked to determine that the

proper paragraphs were selected, all appropriate parts of the letter are included, and the

positioning of the letter parts is appropriate. The standard is 1 00 percent. The document

is either correct or incorrect.

Module 1, Units 4, 5, and 6

The standard for these three combined units is 70 percent. This standard is applied to the

last half of Module 1 (Units 4, 5, and 6) rather than on a unit-by-unit basis. A candidate

who has scored an average of 70 percent of the three units will be certified.

Unit 7 of Module 1 is not required for certification.

Module 2, Unit 1

The standard for Module 2 Unit 1 is 70 percent.

Units 2 and 3 of Module 2 are not required for certification.

Module 3, Units 1, 2, 3, and 4

The standard for the entire module is 70 percent. The standard is applied to the total

module rather than on a unit-by-unit basis.

Module 4, Units 1 and 2

The standard for the composite of these two units is 70 percent. The standard is applied to

the combined score rather than on a unit-by-unit basis.

Unit 3 of Module 4 is not required for certification.

Module 5, Units 1, 2, and 3

The standard for the composite of these three units is 70 percent. The standard is applied to

the combined score rather than on a unit-by-unit basis.

Units 4, 5, and 6 of Module 5 are not required for certification.

Repetition of Modules

Candidates who do not successfully meet the standards specified on all modules on the

assessment may repeat those modules that were not successfully completed. PSI

recommends that candidates do additional preparation and/or practice before repeating the

modules. The tutorial (OPAC Special Version) should be used for practice in-taking the

assessment before repeating the actual assessment.

PSI does not limit the number of times a candidate may repeat the entire assessment or

any unit of the assessment. PSI does recommend to test administrators that candidates be

allowed to take the assessment three times. Only those modules that were not successfully

completed need to be repeated.

Procedures for Certification

Candidates who believe they have met the standards on all units required for certification

should have the test administrator extract the test results from the hard disk and export the

data to a floppy diskette. The diskette must be sent to the OPAC Technical Support Office

for verification that standards have been met. The detailed procedures and a transmittal

form for accomplishing this task are contained in the Test Administrator Manual.

The transmittal form and data diskette become the candidate's application for certification.

After the scores have been verified, the OPAC Technical Support Office forwards the

application to PSI headquarters and notifies PSI that the candidate has met all standards for

certification. PSI then issues the certification.

The standard, nonrefundable fee for processing applications, verifying results, and certifying

candidates is $30 for each candidate. The certification fee must accompany the application

for certification. The check must be made payable to Professional Secretaries International.

The candidate's name and identifying number (Social Security Number for U. S. candidates

or Canadian National Insurance Number for Canadian candidates) should appear on the

check as well as on the transmittal form.

Candidates should not apply for certification until they have met the standards on all

required modules and units. The OPAC system captures and maintains data for the initial

try and for the most recent repetition of all modules and units.

OPAC Research

Research for the OPAC program is segmented into three phases. The initial phase consisted

of a two-year content validity study sponsored by Professional Secretaries International

(PSI) that defined the domain of knowledge, skills, and abilities of entry-level office

employees and provided information concerning the positions of entry-level office

employees. The second phase consisted of developing and field testing the instruments

used to assess the competencies identified in the content validity study. The final phase is

an ongoing research component that will analyze all data collected during a three-to-five

year period of use of the assessment in practical settings.

Content Validity Study

The validity study was organized into five components:

1. Literature review

2. Competency development

3. Survey work

4. Data analysis and competency modification

5. Reporting

A brief review of each phase follows.

Literature Review

The purpose of the literature review was to provide a starting point for the competency

development component of the study. A comprehensive literature search provided

numerous articles and research studies written in the past five years dealing with

competencies needed by secretarial employees, word processing employees, and employees

in general office/clerical-type positions.

The major studies which identified and validated a list of specific competencies needed by

entry-level workers included DACUM (Developing A Curriculum) studies; V-TECS

(Vocational-Technical Education Consortium of States) catalogs of tasks, performance

objectives, and performance guides; and studies conducted or sponsored by state

departments of education. The remainder of the studies consisted primarily of masters

theses, doctoral dissertations, and studies by independent researchers.

The literature review produced a massive list of competencies that had been identified as

essential for office employees. This list provided the starting point for the competency

development component of the PSI study.

Competency Development

The first phase of the competency development process consisted of hiring a content expert

to develop an initial set of competencies, knowledges, skills, and abilities utilizing the list of

competencies obtained in the literature review. Duplicate competencies were eliminated

and similar competencies were combined.

The second phase of the competency development process was an iterative process of

writing, reviewing, and refining the competencies. Managers, business educators, and

secretaries who were members of the Institute for Certifying Secretaries participated in this

phase. A psychometrician was employed to facilitate the group discussion. This synergistic

process was used to help validate relevancy of each competency, ensure that the scope of

the domain of entry-level knowledges, skills, and abilities was adequately covered; ensure

that the competencies were clearly and accurately presented; and organize the

competencies into meaningful content dimensions.

The third phase of the competency development process was the review by the Institute

Task Force on Entry-Level Certification. This second group of managers, business

educators, and secretaries reviewed and refined the competencies. The Task Force was also

given oversight responsibility for the study. The resulting product of the competency

development component was a list of 49 competencies that were organized into eleven job-

content dimensions. These competencies were then used to develop surveys that were

administered to random samples of secretaries, business educators, and managers.

Survey Work

This component consisted of two surveys and a job function diary. Each survey was mailed

to a random sample of members of Professional Secretaries International, business

educators, and managers. Participants provided ratings on the importance of each of the

49 competencies as well as the frequency in which the competency would be used by an

entry-level person, and whether or not the competency represented an essential skill,

knowledge, or ability. Participants also provided both importance ratings and an estimate of

the percentage of time an entry-level person would spend in each of the eleven job

dimensions. Bio-demographic data were also collected.

To obtain more data to augment the "essential/non-essential" data for the study, a job

function diary study was conducted. The purposes of the diary study were to:

1. To determine what tasks and skills are performed by entry-level personnel during

specified work periods.

2. To determine if size of organization makes a difference in the types of tasks required of

entry-level personnel.

3. To determine if the tasks and skills identified were covered by an existing entry-level

competency.

Data Analysis and Competency Modification

A cyclical process was used to integrate this component with the survey work. Data were

analyzed and reviewed after each survey and after the job function diary study.

Competencies were modified based on the survey work.

Results and Reporting

The final report of the content Validity study was prepared and presented to the Institute

Task Force on Entry-Level Certification for approval. The report consisted of the survey

results and an approved, validated set of competencies that were later used in the

development of the entry-level examination program.

The list of competencies was comprehensive as judged by the results of two surveys, as

well as by the efforts of members of the Institute for Certifying Secretaries and the Institute

Task Force on Entry-Level Certification. Of the 49 competencies, 36 were considered

essential for the successful performance of an entry-level office employee. In addition to

the delineation of the essential competencies, the study also provided specific information

on the importance and frequency of use of these competencies.

Content Development

The content validity study defined the content domain in terms of the knowledge, skills, and

abilities required for successful performance as an entry-level office employee. A total of 36

competencies were identified as essential for successful performance.

The relevancy (importance) of each competency and the representativeness (frequency) of

each competency were identified. These data served as the basis for the examination

blueprint and specifications. For a list of the specific competencies, refer to the section of

the manual entitled, Competencies.

Field Test

Over 300 individuals representing over 30 educational institutions and organizations

participated in the field test. Field test data were used to make minor content

modifications, determine appropriate time frames for the various units, and to set the

standards for certification. Information from the technical report is available from

Professional Secretaries International.

Ongoing Research

Performance data from candidates who take the assessment will be collected and analyzed

over a three-to-five year period. The ongoing research component will be used to

determine the extent to which the assessment data meets employment testing standards

and to study the relationship between results on the assessment and job performance.

OPAC (Entry-Level) Content Validity Study: Areas,

Competencies, and Tasks

Status Codes:

A Assessed in the Office Proficiency Assessment and Certification Program.

N Competencies that were identified in the Content Validity Study as not being essential

for entry-level employees.

E Competencies that were identified as essential, but that are not assessed in the OPAC

program at this time. Exploratory work is being done for future assessment.

Company Organization and Policies

N 1.0 Is knowledgeable about the products or services of the company.

N 1.1 Is knowledgeable about the organizational structure of the company.

N 1.2 Is knowledgeable about company policies, both formal and informal.

Technology in the Office

A 2.0 Is knowledgeable about technological changes and innovations and their

impact on a business office.

A 2.1 Understands data and information processing concepts and is familiar with the

basic terminology relating to data and information processing.

A 2.2 Understands the basic concepts of telecommunications such as electronic mail,

facsimile communications, etc., and their impact on distributing information.

A 2.3 Understands the role information processing plays in an office information

system and knows terminology common to information processing.

A 2.4 Understands the role of computers in an office information system and is able

to utilize the word-processing function of computers.

N 2.5 Is familiar with different types of information equipment and systems and

understands how various software packages can extend the capacity of the

information processing equipment.

A 2.6 Is able to follow general instructions for operating information-processing

equipment.

A 2.7 Understands the function of and is able to operate printers.

Human (Interpersonal) Relations

E 3.0 Realizes the importance of developing and promoting good human relations

and is aware of His/her role in relation to superiors, peers, subordinates,

clients or customers, and sales or service personnel.

a. Displays an understanding and acceptance of himself/herself.

b. Recognizes the needs and personal characteristics of others with whom

he/she works.

c. Recognizes the importance of working cooperatively and getting along with

others.

d. Demonstrates tact in sensitive and/or difficult situations.

e. Conducts his/her office activities in an ethical manner.

f. Exhibits consideration and respect for others in the workplace.

g. Develops and maintains a positive work attitude and exhibits responsible

work behavior.

E 3.1 Recognizes that effective career planning and career advancement require that

the objectives of the individuals must be compatible with the objectives of the

organization.

E 3.2 Is able to communicate clearly with employers, fellow workers, and people

outside the company both orally and in writing.

a. Understands the communication process and its value in human and

business relations.

b. Recognizes some of the problems in maintaining effective communications.

c. Strives to improve communications by improving listening skills, using

direct simple language, utilizing feedback, and timing messages carefully.

E 3.3 Recognizes that listening is an important phase of the communication process

and is able to use effective listening techniques.

a. Recognizes barriers to effective listening.

b. Improves listening skills through the use of listening techniques such as

respecting the speaker, withholding evaluations the speaker is saying,

watching for nonverbal communication, organizing what he/she hears, and

minimizing blocks and filters.

E 3.4 Knows and is able to use business etiquette appropriate to office situations.

a. Knows the general roles introductions.

b. Follows appropriate office procedures in treatment of executives,

client/customers, distinguished visitors, and other office callers.

Basic Office Skills

A 4.0 Demonstrates efficient and effective ways of organizing his/her time to

complete work assignments.

a. Analyzes the jobs he/she performs daily and devises a structured plan in

order to reduce or eliminate wasted time. and increase productivity.

b. Uses good judgment and careful thinking; establishes priorities for

handling the work assigned.

c. Demonstrates the ability to use timesaving procedures and devices.

A 4.1 Maintains a well-organized work station to insure a smooth flow of work.

A 4.2 Performs document-producing tasks and keyboarding functions using a variety

of information processing equipment.

A 4.3 Operates information processing equipment to record, edit, print, store, and

revise correspondence, reports, statistical data, forms, lists, and other

materials. This equipment includes automatic typewriters, text-editing

machines, transcription machines, printers, OCRs, and other equipment that

extends information processing capabilities.

a. Exhibits expert keyboarding skills essential to document-producing tasks.

b. Understands and is able to use all special features of information

processing equipment such as merge, pagination, etc.

A 4.4 Produces mailable business communications and carries out instructions from

manual or machine dictation.

a. Keyboards from both longhand and typewritten rough drafts, pre-recorded

dictation, and machine dictation.

b. Edits rough drafts and unarranged copy for proper punctuation,

paragraphing, grammar, etc.

c. Knows basic operating procedures for transcription machines and uses

proper machine transcription techniques.

d. Uses listening and decision-making skills when transcribing from machine

transcription.

e. Is able to follow special instructions for dictated materials and uses

effective techniques of planning, transcribing, and distributing the work.

f. Selects proper stationery and plans the proper format for assigned tasks.

A 4.5 Utilizes basic business knowledge, skills, and vocabulary in processing work

assignments.

A 4.6 Is able to accept responsibility and to carry out assignments with limited

guidance or supervision.

a. Grasps and follows instructions quickly.

b. Organizes materials and supplies for efficient handling and uses equipment

and resources effectively.

c. Meets expected deadlines within a regular working day (except in unusual

situations).

N 4.7 Knows company standards and procedures for processing documents.

a. Follows company procedure manuals.

b. Uses standard company formats and is able to adapt standard formats to

special situations.

c. Meets established quality standards and production deadlines.

A 4.8 Exhibits a high level of mental concentration and demonstrates the ability to

work under pressure of production requirements.

E 4.9 Is able to select and purchase appropriate stationery; typewriter, filing, and

mail supplies; desk accessories; and other office supplies.

a. Identifies and keeps a file on all sources of office supplies.

b. Prepares all requisitions, purchase orders, and/or invoices for replenishing

office supplies.

c. Develops a procedure for maintaining the proper inventory level of all

supplies.

d. Maintains an orderly supply cabinet with supplies arranged conveniently for

general use.

A 4.10 Is responsible for the reproduction of all types of typewritten and printed

documents.

Is familiar with the different reprographic processes and is able to select the

appropriate process for the given situation.

Prepares materials to be photocopied and is able to operate his/her firm's

copying machines.

Prepares requisitions and instructions for materials to be reproduced.

Knows copyright guidelines and follows them in making decisions about legal

or illegal copying.

N 4.11 Assists the executive and other professionals in gathering, processing, and

verifying information needed for preparing reports, presentations, manuals,

and other publications.

a. Knows what reference sources are available and how to use those

resources.

b. Gathers data from resource documents and research materials and

organizes data into a usable form.

Language Arts Skills

A 5.0 Applies basic language arts skills in the composition and keyboarding of all

documents.

a. Knows correct grammar.

b. Knows how to spell words commonly used in business.

c. Knows how to punctuate correctly.

d. Knows how to use capitalization effectively.

e. Knows how to use possessives properly.

f. Knows rules-for correct number usage.

g. Knows how to use abbreviations correctly.

A 5.1 Carefully checks all documents for accuracy.

a. Proofreads letters, memos, and reports for correct grammar; punctuation;

spelling; logical, clear content; and correct and complete data.

b. Proofreads statistical copy for accuracy and adds columns of figures if a

total is given.

Mail, Telephone, and Appointments

A 6.0 Processes mail quickly and efficiently.

a. Sorts, opens, date stamps, prioritizes, and distributes mail to specified

individuals and/or departments.

b. Expedites the executive's handling of the mail by providing background

information and/or pertinent files where appropriate.

c. Keeps a mail register when required by company policy.

d. Prepares outgoing mail so that it can be processed quickly and accurately

by the Postal Service.

e. Addresses envelopes in accordance with Postal Service rules.

f. Includes all enclosures and folds and inserts letters properly in the

envelopes.

g. Knows the different classes of mail and the special mail services available

so that he/she can determine the appropriate class to be used on outgoing

mail.

h. Is familiar with and practices ways to reduce mailing costs.

i. Is able to handle any special problems that arise in processing mail, i.e.,

mailing currency, retrieving mail incorrectly addressed, changing

addresses, forwarding mail, etc.

j. Is familiar with international mail regulations and services.

k. Is knowledgeable about other mailing and shipping services and is able to

make decisions about other means of shipment based on cost, speed of

delivery, and convenience to shipper and receiver.

A 6.1 Has knowledge of telephone services and is able to handle telephone duties

skillfully.

a. Takes appropriate action in given situations, i.e., handling problem calls,

putting callers on hold, transferring calls, placing long distance calls, etc.

b. Uses appropriate techniques in placing and receiving telephone calls

promptly and efficiently.

c. Develops a good telephone personality and uses proper telephone

etiquette.

d. Records telephone messages completely and correctly and delivers them

promptly.

A 6.2 Is responsible for scheduling appointments, maintaining office calendars, and

receiving office callers.

a. Maintains a business-like office atmosphere and exhibits professional

behavior when receiving callers and scheduling appointments.

b. Follows the employer's preferences when scheduling appointments.

c. Records complete information regarding date, time, place, purpose, and

participant's when scheduling appointments.

d. Coordinates his/her appointment calendar with that of his/her employer.

e. Keeps a record of office callers and refers them to the proper person(s).

Written Communications

A 7.0 Composes routine business documents (letters, memos, reports, etc.) and

presents them in a clean, error-free typewritten format that is consistent with

accepted business practices and styles.

a. Knows the characteristics of an effective business letter and includes those

characteristics in composing business letters.

b. Plans the letter before composing, i.e., gathers all the facts, determines

what must be included in the letter, decides upon the order of

presentation, and develops an effective beginning and ending.

c. Identifies the different parts of a letter and knows their correct placement

within the letter.

d. Selects appropriate salutations and complimentary closes.

e. Formats documents appropriately.

f. Has knowledge of different types of business reports and is able to prepare

them according to accepted styles and formats.

g. Differentiates between formal and informal reports.

h. Knows the parts of different kinds of reports and is able to arrange them

properly within the report.

i. Knows how to construct and format charts and tables.

j. Is familiar with the commonly used business forms in his/her office and is

able to locate the information necessary to complete the forms and to fill in

that information correctly.

Records Management

A 8.0 Understands the principles of records management.

A 8.1 Is able to arrange business records in accordance with a systematic plan and

file them in such a manner that they can be located quickly.

a. Identifies basic filing methods and determines the best filing method for

the active records.

b. Knows and applies basic filing rules.

c. Prepares records for filing by indexing, coding, sorting, and (if necessary)

cross-referencing.

d. Prepares folders and labels-for records to be filed.

e. Maintains the confidentiality of records under his/her responsibility.

f. Determines the effectiveness of existing filing systems and makes

recommendations for reorganization where applicable.

g. Is familiar with filing supplies and equipment and makes recommendations

or provides for the acquisition of such.

A 8.2 Assists users in the retrieval and use of records.

a. Develops chargeout and follow-up procedures.

b. Retrieves records from inactive files when requested.

A 8.3 Is familiar with automated filing systems that utilize information processing

and computer technology.

a. Understands the linkage of computer and microfilm as a quick and efficient

means of storing and retrieving records.

b. Is familiar with Computer Output Microfilm (COM) and Computer Assisted

Retrieval.

N 8.4 Understands and follows records retention schedules.

a. Processes records to be transferred from active to inactive files.

b. Processes records for destruction.

N 8.5 Organizes and maintains a filing system for stored and recorded data.

a. Knows the types of storage media and how jobs are assigned and

identified on the storage media.

b. Copies file for backup.

c. Retrieves text and data from stored files either for reprocessing or

continued processing.

d. Files and stores magnetic media according to established procedures.

Financial Records

N 9.0 Retrieves information from financial reports.

A 9.1 Performs simple accounting tasks and keeps some permanent records.

a. Establishes and maintains a petty cash fund.

b. Keyboards financial statements such as balance sheets and income

statements.

c. Maintains basic financial records.

A 9.2 Performs banking activities.

a. Makes deposits, writes and records checks, endorses checks' and

reconciles bank statements.

b. Understands electronic fund transfer, direct deposit or payment telephone

transfer, etc.

A 9.3 Operates office machines that are widely used in computing and producing

financial records.

a. Performs basic math functions on an electronic calculator or other similar

machines.

b. Utilizes office machines to solve business math problems encountered in

financial-related tasks.

Travel

N 10.0 Makes business travel arrangements according to company policies and

procedures.

a. Knows executive preferences with regard to transportation and

accommodations.

b. Sets up a trip file to accumulate all the information and materials relating

to a particular trip.

c. Makes hotel/motel and transportation reservations, processes requests for

use of company auto or plane, and makes arrangements for travel funds.

d. Prepares a complete itinerary.

e. Notifies associates of the executive's plans to be away from the office and

attends to the cancellation and/or rescheduling of meetings scheduled

during the executive's absence.

f. Collects and organizes materials that are necessary for the successful

completion of the trip.

N 10.1 Assumes responsibility for routine office activities during the executive's

absence.

a. Handles daily communications and activities within the scope of his/her

authority and refers exceptions to appropriate personnel.

b. Forwards mail if necessary and maintains a file or files of mail and other

communications and information that will be held for action until the

executive returns.

c. Follows proper procedures for handling the executive's mail during his/her

absence.

N 10.2 Is responsible for compiling and preparing expense reports.

a. Collects the receipts and information necessary for completing an expense

report.

b. Verifies amounts, dates, and places, and enters the information in the

proper form.

Meetings

N 11.0 Assists in the planning, organizing, and implementing of business meetings.

a. Assists in site selection and reserving meeting rooms.

b. Notifies participants of date, time, place, and purpose of the meeting.

c. Prepares and distributes the meeting agenda.

d. Reserves the equipment needed to conduct the meeting and prepares and

assembles materials to be used during the meeting.

e. Performs any follow-up activities required after the meeting.

A 11.1 Records, transcribes, and distributes the minutes of the meeting.

Validation Study for

Secretarial/Administrative Classifications

Using Computer-Based Testing (1991)

Janet M. Burns

Los Alamos National Laboratory

Los Alamos National Laboratory is a United States Department of Energy (DOE)

national laboratory, managed by the University of California.

Paper presented at the International Personnel Management Association Assessment Council Fifteenth

Annual Conference. Chicago, Illinois. June 1991.

Abstract

This paper presents the results of a content and concurrent criterion-related validity study

conducted at Los Alamos National Laboratory for clerical, secretarial and administrative

classifications using computer-based testing. The advantages and disadvantages of

different types of testing software incorporated in the study are explored. Job analysis

methodology, procedure for establishing cut-off scores and comparative adverse impact and

validity are analyzed.

The purpose of this paper is to present the results of a content and concurrent criterion-

related validation study conducted on the administrative and secretarial classifications at

Los Alamos National Laboratory. The Laboratory retained Biddle & Associates/Biddle

Consulting Group Inc. to assist in the design, methodology, form development, analysis and

preparation of the initial compliance report. The objective of the study is to replace the

Lab's traditional typing test administered on IBM Selectric typewriters with a computer-

scored testing environment, including a word processing assessment, for the selection of

secretarial and administrative classifications. Phase II of the study will expand the office

skills assessed to include spreadsheets and data entry.

Los Alamos National Laboratory has administered a traditional typing test to applicants for

secretarial, clerical and administrative positions for over two decades. Test scores are only

one of the criteria considered in the selection process. As in many organizations, the

technology being applied on the job today is far more advanced and thus has outdated the

traditional types of tests used to select secretarial personnel. As this gap continues to

diverge, the current test fails to provide the necessary information required to make sound

selection decisions. Applicants and hiring managers have requested more state-of-the-art

procedures. Our goal is to select and implement a system, which more accurately assesses

a broader range of skills including word processing, and provides more in depth information

about those skills than just speed and accuracy on a typewriter.

The Lab's administrative population is close to 1200 individuals across 24 job titles. The

Lab's total population is approximately 7400. Between 70 and 130 candidates test each

month with a significant number of selections made annually for these titles. Historically,

applicants have had to pass the typing test at one of two different speeds depending on the

requirements of the position in order to progress to the next phase of the selection process.

The cutoff score is 55 words per minute and 5 errors for the secretarial test, and 25 words

per minute and 5 errors for the clerical test. Early investigation of these job titles and job

content indicated that the required office skills varied within and between classifications.

This situation becomes even more complex when incumbents use multiple software

packages and an organizational standard for word processing software does not exist.

These and other factors to be discussed strongly influenced the design of this study. The

following sections will explain how we dealt with the uniqueness of this study and the

results which followed.

Method

This study was conducted following Section 15C of the Uniform Guidelines on Employee

Selection Procedures (1978). The primary methodology is content validity. Criterion-

related validity was included to augment the study and answer some additional questions.

Multiple classifications, multiple kinds of word processing software, variety of required office

skills within and between classifications, and the need for a computer-based testing and

scoring system greatly influenced our approach.

Identification of Tests and Test Publishers

Four test publishers were identified for inclusion in the study based on the types of

computer scored tests available, the range of skills which could be assessed, existing

validation research, cost, and whether specialized word processing software tests (i.e.

WordPerfect, MultiMate, etc.) were available. We were also interested in "generic" clerical

tests for individuals without word processing experience and for positions which might

require assessing office skills other than word processing such as editing, grammar or

spelling.

MANPOWER INC.

, Tap Dance

, QWIZ Inc.

, Office Proficiency and Assessment

Certification

®4

(hereinafter called OPAC

) submitted tests for the study. A total of 12 tests

were selected for this phase and are listed in Appendix A according to the type of test along

with a short descriptive footnote of each test. Appendix B lists a total of 41 test scales

being measured by the 12 tests. The number of tests, test scales and different word

processing software packages complicated the analysis significantly.

Job Analysis - Part I

A survey was sent to incumbents of each of the 24 clerical, secretarial and administrative

classifications for which the current typing test was being used to see if a word processor

was being used, and if so, on what equipment and which software package. 749 of the 1139

incumbents responded to this first survey. WordPerfect, Microsoft Word for the IBM,

Microsoft Word for the Macintosh and MultiMate were identified as the software packages

used most frequently. The number of users for WordPerfect and Microsoft Word for the

Macintosh were 249 and 259 respectively. Only 43 of the 1139 incumbents indicated they

were not using word processing on the job and thus were not included in the study. The fact

that 96% of the Lab's population is using word processing confirmed the need for a

replacement to the current test which is not measuring this skill.

Criterion Development and Sampling

Supervisors of the 749 job incumbents who responded to the first survey were invited to a

supervisory workshop. The purpose of the workshop was to identify the skills and levels of

skills used in the effected classifications. The form used to gather the data on skill ratings is

in Appendix C. The rating scales were developed based on what can be measured by the

selected tests. Using a scale from 1-5 supervisor's were asked to rate the employee's

speed, accuracy, and (where requested) levels of skill for the nine office skills listed. A

description of each skill and definitions for each rating scale were provided. Supervisors

were instructed to provide ratings only for the skills for which they had first-hand

knowledge. As mentioned earlier, the rating form includes skills such as spreadsheet,

database management and data entry, which will be part of Phase II.

Over 100 supervisors participated in the criterion workshops resulting in ratings for 292

unique individuals. 259 incumbents received single ratings while 33 received more than one

rating. The multiple ratings were averaged for the analysis.

Job Analysis - Part II, and Testing of Incumbents on

Computers

Incumbents who received ratings by their supervisors were invited to take the tests

included in the experimental test battery. Participation was voluntary. At the time the

incumbents took the tests, they were asked to complete a form that gathered job analysis

and test evaluation information. Incumbents were asked several questions as subject

matter experts: is some level of the skill being measured needed, identify a duty for which

MANPOWER Inc., is a registered trademark of MANPOWER Inc.

Tap Dance is a trademark of International Testing Services Inc.

QWIZ Inc., is a registered trademark of QWIZ Inc.

OPAC is a registered trademark of Professional Secretaries International.

the skill is required, is the skill distinguishing, is the test a representative sample of the skill

used on the job, did the test require more skill from the test taker than is required on the

job, can the skill be learned in less than 8 hours, does the test resemble one or more job

duties, and does the product of the test resemble a work product. After taking the tests

subject matter experts were then asked to estimate what a minimally qualified applicant's

score should be on each test scale. Using a 1-5 importance rating scale incumbents were

also asked to rate the job duty. A questionnaire was completed for every test administered.

This information was captured in several databases. Every question and each test scale

were analyzed for the content validity report.

While 113 incumbents participated in the testing, they were only asked to take tests that

they linked to one or more job duties. Others did not complete the entire test battery for

various reasons. Therefore, the sample sizes for each test are not equal.

Tests were administered on five IBM PC's and one Macintosh SE over a seven-week period.

The length of time to complete all 12 tests ranged from 5 to 9 hours depending on the

individual.

A survey was also conducted of nine local high schools and community colleges to verify

that students were being taught with word processors rather than typewriters. Every school

surveyed is using word processors except for the one private school in the sample.

WordPerfect was the most prevalent software being used.

Results

Content Validity

Evidence demonstrating that each test scale is a representative sample of a duty performed

on the job was established through the content validity questionnaire. A minimum of 50%

of the incumbents was set as a minimum standard of acceptance. This means that 50% of

the incumbents had to agree on each of the content validity questions described in the

methodology. A standard of 70% was set as the preferred standard. Each test scale easily

passed all the minimum standards except for Manpower's multiple choice word processing

test for WordPerfect users. Incumbent responses indicated that this test requires more skill

than is needed on their particular jobs.

Concurrent Criterion-Related Validity

Hypotheses and the anticipated direction were determined for each test scale to each

relevant rating scale. A one-tail .05 level of statistical significance was applied with the

specified direction. Pearson Product-Moment Correlations were calculated for each of the

hypothesized relationships. Two of the test publishers had test scales that were correlated

significantly at the 5% level of chance with speed ratings from the supervisors: OPAC and

Tap Dance. Both of their 5-Minute Tests demonstrated statistical significance with speed

performance ratings and Tap Dance's Word processing Test also showed statistical

significance to the speed ratings.

Ratings of accuracy of work performed were correlated significantly with scales from three

of the test publishers: Manpower's Ultraskill, OPAC's Language Arts, and Tap Dance's

Editing and Word processing tests.

Level 1 word processing skill as evaluated by supervisors was correlated significantly with

tests from all four test publishers. Level 2 ratings were correlated significantly with tests

from three test publishers: Manpower, OPAC and QWIZ.

When the data is analyzed separately by software, sample sizes decrease and statistical

significance is not achieved for all software specific correlations. The 5-minute typing tests

from OPAC, Tap Dance and the Lab all correlated significantly with speed, while none of the

5-minute tests correlated with accuracy. QWIZ did not predict speed or accuracy though

the sample sizes were smaller. Manpower does not measure speed with a 5 minute timed

typing test.

Alternative Procedures Investigated

Each of the test scales showing statistical significance with supervisory ratings was analyzed

through the Biddle Consulting Group Statistical Cutoff Program. The program calculates

statistical significance between groups for each score and calculates practical significance as

well. According to Section 4D of the Uniform Guidelines on Employee Selection Procedures

(1978) both statistical and practical significance must be shown in order for adverse impact

to exist.

Cochran's correction to the chi-square was used for statistical significance at the .05 level of

probability. Practical significance exists if it took more than adding two people to the

disadvantaged group to change the statistical significance finding, more than 3 people

added to the disadvantaged group to change the 80% Rule of Thumb test, and more than

four people to bring the passing rates of the two groups to within 2.1% of each other.

Using these rules, adverse impact was found for one or more scores in the samples for

Manpower's spelling scale, and OPAC's words per minute and keystrokes scale. For each of

these test scales there were several scores without adverse impact. However, because the

incumbents taking these tests had already passed the Los Alamos Typing Test, these results

need to be reanalyzed with unrestricted data from applicants in general.

When evaluating test publishers on the basis of statistical validity, content validity and

adverse impact against the accuracy rating scale, OPAC and Tap Dance tests produce

validity without adverse impact - The Manpower Ultraskill spelling scale does produce

adverse impact with validity, while the other scales produce validity without adverse impact.

No QWIZ test scale produced statistical validity with the accuracy ratings. When evaluating

test publishers on the basis of statistical validity, content validity and adverse impact

against the speed rating scale, the OPAC 5-Minute Test produced adverse impact and

validity. The Tap Dance 5-Minute Test and Word processing tests produced validity without

adverse impact.

When evaluating test publishers on the basis of statistical validity, content validity and

adverse impact against the word processing level 1 rating, all test publishers produced

validity without adverse impact.

When evaluating test publishers on the basis of statistical validity, content validity and

adverse impact against the word processing level 2 rating, all four test publishers produced

validity without adverse impact. However, the statistical validity for Tap Dance's Word

processing Test was found for only the Microsoft sample.

Tests to test correlations were also performed for samples greater than or equal to ten. A

procedure similar to identifying the test to ratings relationships was applied. Each test had

many scales. Only those scales that we hypothesized to have the most obvious

relationships were selected for the analysis. it is possible that correlations could exist

between scales that were not analyzed.

Discussion

An overwhelming amount of data has been collected and analyzed in this study. This

discussion will focus on three important areas that emerged from the analysis: content

validity design, equal validities and adverse impact, and the intercorrelations.

The content validity approach used in this study allowed us to validate a number of different

tests across numerous secretarial and administrative job classifications where incumbents

within a classification are using word processing at different levels. This was a non-

traditional approach to job analysis that addresses Section 14C of the Uniform Guidelines on

Employee Selection Procedures (1978). The responses to the questionnaires showed

outstanding support for almost all of the tests with cutoff scores established at the point

which 70% of the incumbents agreed on that score or a more stringent score. Regardless

of the test the Lab selects, as job openings occur, hiring managers will have to identify

whether or not word processing is a requirement for that position within a classification. By

focusing on the common skills the testing function will be more responsive to changing and

varied job requirements at the Lab. Job content will drive the process rather than strictly

job title.

This study presented a unique illustration of Section 3B, Suitable Alternatives, of the

Uniform Guidelines on Employee Selection Procedures (1978). When alternative selection

procedures (i. e. different tests or test scales) or alternate uses of a selection test (i.e.

different weights within a job-related range or alternate cutoff's) are substantially equally

valid for a given purpose, the one with less adverse impact should be used. The OPAC and

Tap Dance 5 Minute Timed Typing Tests each produced statistically significant validities with

speed ratings, r

= .28, n = 62 and r = .31, n = 57, with and without adverse impact,

respectively. The correlation between the two tests was r

= .88, n = 55. The validity

coefficients are not significantly different. The Tap Dance Word Processing test also

predicted speed, r

= .43, n = 49, without adverse impact. The OPAC 5-Minute Timed

Typing Test and Tap Dance Word Processing Test validity coefficients are not significantly

different. The intercorrelation of .47, n

= 45 is significant.

The Manpower Ultraskill spelling scale and Tap Dance Editing error scale evaluated against

accuracy also produced substantially equal validities, r

= .34, n = 72 and r =. 49, n = 53,

with and without adverse impact, respectively. The correlation between the two tests was

significant, r

= .36, n = 47.

When the Manpower Ultraskill spelling scale and OPAC Language Arts spelling scale are

evaluated against the accuracy rating scale, both tests produced substantially equal

validities, r

= .34, n = 72 and r = .44, n = 39, respectively. Only the Manpower spelling

test produced adverse impact. Though both are spelling test scales the intercorrelation of

.19 was not significant. It appears each test is measuring different parts of the accuracy

criterion.

Correlations between the tests are interesting, however the sample sizes restrict any

definitive conclusions. Analyzing the data separately for each specific software unavoidably

reduced the sample sizes. Although a major effort was made to have every incumbent take

every test this was not always possible. Of the correlations analyzed OPAC has a moderate

relationship with Tap Dance and the Manpower RAP written test, and none with the

Manpower Ultraskill test. Tap Dance appears to be measuring some of the same skills as

OPAC and the Manpower tests, though the relationship with the written test is stronger. It

is the written knowledge test of Manpower that seems to be similar to the other word

processing tests. Since the Manpower Ultraskill test is not a "pure" word processing test,

and is considered an assessment of clerical skills on a word processor, it is not surprising

that minimal or no relationships exist with the other tests. It is not intended to measure

"word processing", but is administered on the specific word processing package with which

a person must be familiar. There is some relationship however between the two Manpower

tests. QWIZ has a very low relationship to OPAC and no relationship with either Manpower

test.

The Los Alamos, Tap Dance and OPAC 5-Minute Typing tests predicted ratings of speed but

not accuracy. Only Tap Dance measured speed without adverse impact. As hypothesized

the 5-Minute Typing Tests show strong intercorrelations with larger samples. All three test

publishers and the Los Alamos typing test appear to be measuring a very similar skill. The

Los Alamos typing test was very highly correlated to each of the computer-based 5-minute

typing tests. Direct restriction of range is present with the Lab's current typing test as well

as indirect restriction of range to the extent the others are correlated. When correcting for

restriction of range the validity coefficient increases from

= .3047 to r = .395.

This data is only applicable to the samples used at Los Alamos National Laboratory. All

tests included in the study offer their own unique advantages that must be considered along

with the statistical results and other practical concerns for each organization. Manpower is

the only test publisher with a test for the Macintosh. This is an important issue for the Lab

since the number of Mac users is increasing daily. Several criteria will be applied to each of

the tests before a decision is made. A predictive study is planned as a follow-up.

Note: No reference to this study should imply an endorsement or criticism of the test

publishers or their tests.

The author gratefully acknowledges Charlotte Garcia, the Laboratory's Test Administrator

for her outstanding work and contribution to this project.

Content and Concurrent Criterion-Related

Validity for Some OPAC

Tests

Richard E. Biddle

Introduction

A study was conducted at an employer with more than 5000 employees to examine the validity

of several OPAC (Office Proficiency Assessment and Certification) tests. The OPAC tests

were originally developed and content validated by Professional Secretaries International.

The OPAC tests are computer administered and computer scored.

The employer involved in the study was searching for a word-processing test that could be

administered and scored in as independent a test environment as was feasible to replace its

traditional 5 minute timed typing test. The traditional typing test measured speed and

accuracy using IBM selectric typewriters and required extensive test administrator supervision.

The employer wanted a test that could measure an applicant's skill at creating, formatting,

proofing, and editing documents, while also measuring word-processing skill using a word-

processing type program. An applicant's speed and accuracy were also important factors for

the test to measure. Also, the employer wanted to minimize the test administrator's time.

More than 20 classifications needed a selection testing procedure that measured keyboarding

speed and accuracy as well as some level of word-processing skill.

To add to the problem, many different types of word-processing software were being used.

The new selection procedure needed to test applicants using different word processing

programs.

Experimental Test Battery

OPAC tests of Language Arts 1, Editing/Formatting from Rough Draft, and Keyboarding were

used as part of an experimental test battery.

The OPAC Language Arts 1 test evaluated in the study was used to measure skills in proofing a

document to identify errors in grammar, spelling, punctuation, capitalization, possessiveness,

number usage, and abbreviations.

The OPAC Editing/Formatting from Rough Draft test was used to measure skills in operating

features and functions of a specific word-processing program.

The OPAC Keyboarding test was used to measure an individual's speed and accuracy of typing

text on a keyboard.

Identification of Sample

Incumbents of 24 secretarial, clerical, and administrative classifications were sent a survey.

The survey asked about the use of word-processing equipment and software. Of the 1139

incumbents who were sent surveys, 65.8% responded (749). The responses showed that

WordPerfect, Microsoft Word for the IBM and Macintosh, and MultiMate were the word-

processing software most frequently used. Of the 749 incumbents who responded, 94.3%

(706) indicated that they were using some form of word-processing on the job. About half

(378) used more than one wordprocessor, including text editors or desktop publishing. The

5.7% (43) who used no word-processing on the job were not included further with the study.

Job Performance Ratings

A series of workshops were conducted for those who supervised the survey respondents to

obtain ratings of job performance. Supervisors evaluated job performance on a rating scale

that ranged from 1-5. The rating scales covered speed and accuracy for nine office skills. The

scales also incorporated levels of skill in three areas (when a rating scale was relevant to the

job). The nine office skills were: (1) text from hard copy, (2) text from machine dictation, (3)

charts/tables/statistics from hard copy, (4) spreadsheet skill, (5) database management skill,

(6) data entry skill: numeric, (7) data entry skill: alpha-numeric, (8) ten-key skill, and (9)

shorthand/speed writing and transcription skill.

Skills (1) text from hard copy, (2) text from machine dictation, and (3) charts/tables/statistics

from hard copy were grouped for a word-processing level of skill rating. Level I included

setting tabs, margins, and justification to format documents; using common function keys,

such as bold, underline, and center; making simple edits by using delete and insert keys;

typing information on pre-printed forms; and naming, saving, printing, and retrieving

documents. Level II included setting up, editing, copying, and moving columns; using headers

and footers; creating templates and boilerplate formats; creating forms; merging form letters

and forms with variable data; creating and printing labels; using DOS commands; using

various sizes and styles of lettering; and archiving. Level III included creating and using

macros; using graphics; converting documents to ASCII; using math functions; creating a

dictionary for the system; and linking spreadsheets or database system information with word-

processing documents.

Definitions were provided to the supervisors for each of the skills and scales. Further,

supervisors were instructed to only provide input where they had first-hand knowledge.

More than 100 supervisors gave ratings for 292 incumbents during the workshops. Of the 292

incumbents who received ratings, 33 received multiple ratings or ratings from more than one

supervisor. For analysis purposes, the multiple ratings were averaged.

Data on the Experimental Tests

The 292 incumbents who received ratings by their supervisors were invited to participate in

parts of the experimental test battery. Since supervisors only rated incumbents on skills that

were relevant in their situation, and when they had first-hand knowledge of the work

behaviors, not all of the 292 incumbents received ratings on all of the skills. Since

involvement in the study was voluntary, not all of the 292 incumbents who had received

ratings took all of the experimental tests. Testing was conducted over a seven-week period

using six PC's. Of the 292 incumbents with ratings, 110 actually took one or more of the tests

in the experimental test battery. Of the 110 incumbents taking the tests, 75 took the OPAC

Keyboarding Test, 68 took the OPAC Editing/Formatting from Rough Draft Test, and 50 took

the OPAC Language Arts 1 Test. A variety of conditions dictated which tests were

administered to each incumbent, including the duties the incumbent performed, amount of

time the incumbent could spend taking the experimental tests, software and hardware the

incumbent used on the job, software and hardware available for testing at the time, sample

already obtained in the study, etc. Because of these conditions, the samples varied from test

to test.

Job Analysis and Test Evaluation

Some of the incumbents who took the experimental tests also evaluated the tests and

provided data as subject matter experts (SME's). After taking an experimental test, the

incumbent was asked to answer (as a subject matter expert) a content validity survey form for

that test. If the subject matter expert stated that some level of the skill, which the test

measured, was a necessary prerequisite for successful performance of a critical or important

job duty, then several other questions were asked. These additional questions asked for a

description of the critical or important duties which required use of the skill, then asked for

ratings of the degree of importance of that skill. Additional questions subject matter experts

answered dealt with the level of the skill which resulted in better performance, if the test was a

representative sample of the skill, if the test required more skill from the test taker than was

required on the job, if the skill could be learned in a brief orientation, and if the work product

of the test closely resembled a work product produced on the job. To obtain information for a

job-related cutoff, subject matter experts were given their score and then asked to provide

their opinion of the minimum score necessary to pass minimally qualified applicants (following

some of the basics of the Angoff model). (See Angoff 1971.)

Content Validity Results

If a test product results in adverse impact against a protected group (i.e., one sex, race, or

ethnic origin group scores disproportionately lower than another group on the test), the

Uniform Guidelines specifically allow content validity as a method of showing business

necessity for the test. (See Uniform Guidelines Sections II, 5A, and 14C.) According to the

Uniform Guidelines, content validity is:

Demonstrated by data showing that the content of a selection procedure is

representative of important aspects of performance on the job. (See Uniform

Guidelines Section 16D.)

In Contreras v. City of Los Angeles

, five of seven subject matter experts had to agree on

decisions dealing with job relatedness. This standard was accepted by the Court. (See

Contreras 1981). In U.S. v. South Carolina

, 50% of the subject matter experts had to agree

for test items to be judged job-related. This standard was accepted by the Court. (See South

Carolina 1978.) Therefore, in this study, a minimum standard was set for the content validity

of a test when 50% of the incumbents agreed on all the questions in the content validity

survey. The preferred standard was set at 70%.

Each test passed all the minimum content validity standards. Therefore, each of the three

tests was content valid. In addition, with the exception of word-processing being learned in a

brief orientation, every test passed the preferred

standards for content validity, as can be

seen, in the chart below:

Percent of SME's Who Say: Keyboarding Word Processing Language Arts I Total

WPM Total

Some level of skill needed 100 100 100 100

Skill is distinguishing 73 70 81 92

Test is representative sample 77 80 88 90

Test does not require more than

job

97 97 91 80

Skill cannot be learned in 8

hours

76 76 62 86

Test resembles job duty 89 91 99 94

Test resembles work product 85 88 94 90

The following cutoffs were agreed to by at least 70% of the subject matter experts:

Keyboarding Test Error Count Scale 5.0

Keyboarding Test Speed Words Per Minute Scale 55.0

Keyboarding Test Gross Key Strokes Scale 1443.0

Editing/Formatting from Rough Draft Test

Total Score Scale 13.0

Language Arts 1 Test Capitalization Scale .6

Language Arts 1 Test Possessives Scale .5

Language Arts 1 Test Number Usage Scale .5

Language Arts 1 Test Abbreviations Scale .5

Language Arts 1 Test Punctuation Scale .5

Language Arts 1 Test Spelling Scale .5

Language Arts 1 Test Grammar Scale .7

Language Arts 1 Test Total Score Scale 50.0

Concurrent Criterion-Related Validity Results

If a test results in adverse impact against a protected group, the Uniform Guidelines

specifically allow concurrent criterion-related validity as a method of showing business

necessity for the test. (See Uniform Guidelines Sections II, 5A, and 14B(4).) According to the

Uniform Guidelines, criterion-related validity is defined as follows:

Demonstrated by empirical data showing that the selection procedure is predictive of

or significantly correlated with important elements of work behavior. (See

Guidelines Section 16F.)

Concurrent

criterion-related validity usually uses current employees as the sample, obtaining

test scores and criteria data (e.g., supervisory ratings) during relatively the same time period.

Predictive

criterion-related validity often uses applicants as the sample, obtaining test scores

during one period of time, then waiting to gather criteria data later. This study used

concurrent

criterion-related validity.

Directional hypotheses were set for each scale of the experimental tests. Statistical

significance was set at the one-tailed .05 level, specifying the direction of the relationship.

Correlations were calculated only for each of the hypothesized relationships using the Pearson

Product Moment formula. Restriction in range was a concern, as all the incumbents had

passed the employer's 5 minute timed test in order to obtain their jobs initially. It was

suspected that many of the tests in the experimental test battery would correlate with the

local 5 minute timed test. However, because the Equal Employment Opportunity field has not

established a clear rule allowing for correcting correlations not quite significant into

significance, no corrections were made for any possible indirect restriction in range. All

correlations presented are uncorrected.

OPAC Tests

Below are the correlations calculated between the OPAC tests and supervisory ratings of job

performance hypothesized as having a possible relationship. Correlations are index numbers

which show the degree of relationship between the test score and supervisory ratings.

Correlations range from 1.0, showing a perfect relationship, to 0.0, showing absolutely no

relationship. A -1.0 means a perfect inverse relationship - as one score goes up, the other

score goes down. Correlations shown below with the asterisk (*) are statistically significant

correlations. This means that the degree of relationship is so strong that the relationship is

unlikely to be due to chance and chance alone except maybe 5% of the time or less.

Statistically significant correlations were found between the OPAC tests and the Accuracy

Rating, Speed Rating, ratings of Level I of Word-processing Skill, and ratings of Level II Word-

processing Skill. Very few ratings were obtained on Level III ratings of Word-processing Skill.

Each of the Language Arts Test scales statistically significantly correlated with the Accuracy

Rating independently, except for the Possessives Scale. The Possessives Scale was close to

statistical significance (.259 obtained and .268 needed). The N shown below refers to the

number of incumbents who took the tests and

had supervisory ratings used in the correlation

calculations.

Correlations Between OPAC Test Scales and Supervisory

Ratings

Language Arts 1.1 Test Supervisory Performance Ratings

Publisher Test Test Scale Accuracy

Speed Level I Level II

OPAC Language Arts Abbreviations .29

OPAC Language Arts Capitalization .40

OPAC Language Arts Grammar .42

OPAC Language Arts Number Usage .38

OPAC Language Arts Percent Score .55

OPAC Language Arts Possessives .26

OPAC Language Arts Punctuation .44

OPAC Language Arts Spelling .44

OPAC Language Arts Total .55

Editing/Formatting from a Rough Draft Supervisory Performance Ratings

Publisher Test Test Scale Accuracy

Speed Level I

Level II

OPAC Word

Processing

% Score .25 .25

Keyboarding Supervisory Performance Ratings

Publisher Test Test Scale Accuracy

Speed Level I Level II

OPAC 5 Min Typing Incorrect -.12

OPAC 5 Min Typing Strokes .28

OPAC 5 Min Typing WPM .28

N = 39.

N = 66.

N = 54.

N = 62.

N = 59.

N = 62.

Alternative Procedure Analysis

The Uniform Guidelines state that:

Where two or more selection procedures are available which serve the user's

legitimate interest in efficient and trustworthy workmanship, and which are

substantially equally valid for a given purpose, the user should use the procedure

which has been demonstrated to have the lesser adverse impact. (See Uniform

Guidelines Section 3B.)

Three other test batteries were included in the study. Therefore, data was available to

evaluate "substantially equally valid" and the relative adverse impact of the four test batteries.

Substantially equally valid can be evaluated using content validity and concurrent criterion-

related validity. Using content validity, the four test batteries included in this study had

substantially equally valid tests with the exception of one test battery's word-processing test

for WordPerfect. Using concurrent criterion-related validity, several of the test batteries were

close. Many of the key correlations were compared and found to be not significantly different.

However, the OPAC test battery was the only test battery to have statistically significant

correlations to all four ratings: Speed Rating, Accuracy Rating, Level I of Word-processing

Skill Rating, and Level II of Word-processing Skill Rating. Therefore, under concurrent

criterion-related validity, no other test battery was substantially equally valid to the OPAC test

battery.

An analysis of adverse impact was nevertheless conducted. Since about half or more of the

participants in the samples for the tests were Hispanic, adverse impact analyses were feasible.

The Uniform Guidelines requires an analysis of both statistical significance and practical

significance in determining adverse impact. (See Uniform Guidelines Section 4D.) For speed

purposes, Cochran's correction to the chi-square was used to best approximate statistical

significance at the .05 level. (See Haber 1980.) This is a two sample hypergeometric test.

Practical significance needs to be addressed to complete the evaluation of adverse impact.

(See Uniform Guidelines Section 4D and Baldus 1980). Practical significance with rate

differences involves at least three calculations. Each of these calculations involves the effects

of small number changes on other statistics. How many more people need to be added to the

disadvantaged group's passing number to: (1) change the statistical significance conclusion,

(2) change the 80 Percent Rule of Thumb conclusion, or (3) change the selection rates

themselves from being different to being the same or very close to being the same. When 2 or

fewer people added to the disadvantaged group can alter the statistical conclusion, the results

were found to be not practically significant. (See Waisome 1991). When 3 or fewer people

added to the disadvantaged group alters the 80 Percent Rule of Thumb conclusion or adding 4

or fewer people brings the selection rates to being very close to one another (within 2.1%),

then the results were found to be not practically significant. (See Contreras 1981). Both of

these court case citations are Federal circuit court decisions. (For a more detailed discussion

of adverse impact see reference: Biddle 1992.)

Using the statistical significance and practical significance rules described above, adverse

impact was found for another test battery's spelling scale and OPAC's Keyboarding test for

words per minute and key strokes scales. (Since the time of the study, OPAC's Keyboarding

test format has been changed. The new format preserves the test that showed the content

and criterion-related validity, but now allows the candidate to take the test in a scrolling mode

or from hard copy text.)

Overall Conclusions Considering Validity and Adverse Impact

Using criterion-related validity as the standard for "substantially equally valid for a given

purpose" for the Section 3B analysis described in this paper, OPAC was the only test battery

(in the experimental test batteries) with tests that correlated statistically significantly to all

four of the employer's criteria (Speed Ratings, Accuracy Ratings, Level I of Word-processing

Skill, and Level II of Word-processing Skill). Since the OPAC test battery was the only

test

battery that correlated statistically significantly with all four criteria, the other four test

batteries cannot be considered "substantially equally valid for a given purpose." The OPAC

test battery was able to correlate "above chance levels" to criteria the other tests did not in

this situation.

Content and Criterion-Related Validity Report

for the OPAC

System (1994)

A study was conducted at a large federal employer with more than 5000 employees to

examine the validity of several OPAC

(Office Proficiency Assessment and Certification

)

System tests. The OPAC tests were originally developed and content validated by

Professional Secretaries International

. The OPAC tests are computer administered and

computer scored.

The employer involved in the study was searching for a self-administered, self-scored

computerized word-processing test to replace its traditional 5-minute timed typing test.

The traditional typing test measured speed and accuracy using IBM Selectric typewriters

and required extensive test administrator supervision. The employer also wanted a test

that could measure an applicant's skill at creating, formatting, proofing, and editing

documents, while also measuring word-processing skills using a word-processing type

program. An applicant's speed and accuracy were also important factors for the test to

measure. Additionally, the employer wanted to minimize the time necessary for test

administration.

More than 20 job classifications needed a selection testing procedure that measured

keyboarding speed and accuracy as well as some level of word-processing skill.

Experimental Test Battery

OPAC tests of Language Arts 1, Editing/Formatting from Rough Draft, Advanced

Editing/Formatting from Rough Draft, and Keyboarding were used as part of an

experimental test battery.

The OPAC Language Arts 1 test evaluated in the study was used to measure skills in

proofing a document to identify errors in grammar, spelling, punctuation, capitalization,

possessiveness, number usage, and abbreviations.

The OPAC Editing/Formatting from Rough Draft test was used to measure skills in operating

features and functions of specific word-processing programs.

The OPAC Advanced Editing/Formatting from Rough Draft test was used to measure skills in

operating advanced features and functions of specific word-processing programs.

The OPAC Keyboarding test was used to measure an individual's speed and accuracy of

typing text on a keyboard.

Identification of Sample

Incumbents of 24 secretarial, clerical, and administrative classifications were sent a survey.

The survey asked about the use of word-processing equipment and software. Of the 1139

incumbents who were sent surveys, 65.8% responded (749). The responses showed that

WordPerfect, Microsoft Word for the IBM and Macintosh, and MultiMate were the word-

processing software most frequently used. Of the 749 incumbents who responded, 94.3%

(706) indicated that they were using some form of word-processing on the job. About half

(378) used more than one word processor, including text editors or desktop publishing. The

5.7% (43) who used no word-processing on the job were no longer included in the study.

Job Performance Ratings

A series of workshops were conducted for those who supervised the survey respondents to

obtain ratings of job performance. Supervisors evaluated job performance on a rating scale

that ranged from 1-5. The rating scales covered speed and accuracy for nine office skills.

The scales also incorporated levels of skill in three areas (when a rating scale was relevant

to the job). The nine office skills were: (1) text from hard copy, (2) text from machine

dictation, (3) charts/tables/statistics from hard copy, (4) spreadsheet skill, (5) database

management skill, (6) data entry skill: numeric, (7) data entry skill: alpha-numeric, (8) ten-

key skill, and (9) shorthand/speed writing and transcription skill.

Skills (1) text from hard copy, (2) text from machine dictation, and (3)

charts/tables/statistics from hard copy were grouped for a word-processing level of skill

rating. Level I included setting tabs, margins, and justification to format documents using

common function keys such as bold, underline, and center, making simple edits by using

delete and insert keys, typing information on pre-printed forms, and naming, saving,

printing, and retrieving documents. Level II included setting up, editing, copying, and

moving columns, using headers and footers, creating templates and boilerplate formats,

creating forms, merging form letters and forms with variable data, creating and printing

labels, using DOS commands, using various sizes and styles of lettering, and archiving.

Level III included creating and using macros, using graphics, converting documents to

ASCII, using math functions, creating a dictionary for the system, and linking spreadsheets

or database system information with word-processing documents.

Definitions were provided to the supervisors for each of the skills and scales. Further,

supervisors were instructed to only provide input where they had first-hand knowledge.

More than 100 supervisors gave ratings for 292 incumbents during the workshops. Of the

292 incumbents who received ratings, 33 received multiple ratings or ratings from more

than one supervisor. For analysis purposes, the multiple ratings were averaged.

Data on the Experimental Tests

The 292 incumbents who received ratings by their supervisors were invited to participate in

parts of the experimental test battery. Since supervisors only rated incumbents on skills

that were relevant in their situation, and when they had first-hand knowledge of the work

behaviors, not all of the 292 incumbents received ratings on all of the skills. Since

involvement in the study was voluntary, not all of the 292 incumbents who had received

ratings took all of the experimental tests. Testing was conducted over a seven-week period

using six PC's. Of the 292 incumbents with ratings, 110 actually took one or more of the

tests in the experimental test battery. Of the 110 incumbents taking the tests, 75 took the

OPAC Keyboarding Test, 68 took the OPAC Editing/Formatting from Rough Draft Test, and

50 took the OPAC Language Arts 1 Test. A variety of conditions dictated which tests were

administered to each incumbent, including the duties the incumbent performed, amount of

time the incumbent could spend taking the experimental tests, software and hardware the

incumbent used on the job, software and hardware available for testing at the time, sample

already obtained in the study, etc. Because of these conditions, the samples varied from

test to test.

Job Analysis and Test Evaluation

Some of the incumbents who took the experimental tests also evaluated the tests and

provided data as subject-matter experts (SMEs). After taking an experimental test, the

incumbent was asked to answer (as a subject-matter expert) a content validity survey form

for that test. If the subject-matter expert stated that some level of the skill, which the test

measured, was a necessary prerequisite for successful performance of a critical or important

job duty, then several other questions were asked. These additional questions asked for a

description of the critical or important duties which required use of the skill, then asked for

ratings of the degree of importance of that skill. Additional questions subject-matter

experts answered dealt with the level of the skill which resulted in better performance, if the

test was a representative sample of the skill, if the test required more skill from the test

taker than was required on the job, if the skill could be learned in a brief orientation, and if

the work product of the test closely resembled a work product produced on the job. To

obtain information for a job-related cutoff, subject-matter experts were given their score

and then asked to provide their opinion of the minimum score necessary to pass minimally

qualified applicants following some of the basics of the Angoff model. (See Angoff 1971.)

Content Validity Results

If a test product results in adverse impact against a protected group (i.e., one sex, race, or

ethnic origin group scores disproportionately lower than another group on the test), the

Uniform Guidelines specifically allow content validity as a method of showing business

necessity for the test. (See Uniform Guidelines Sections II, 5A, and 14C.) According to the

Uniform Guidelines, content validity is:

Demonstrated by data showing that the content of a selection

procedure is representative of important aspects of performance

on the job. (See Uniform Guidelines Section 16D.)

In Contreras v. City of Los Angeles

, five of seven subject-matter experts had to agree on

decisions dealing with job relatedness. This standard was accepted by the Court. (See

Contreras 1981.) In U.S. v. South Carolina

, 50% of the subject-matter experts had to

agree for test items to be judged job-related. This standard was accepted by the Court.

(See South Carolina 1978.) Therefore, in this study, a minimum standard was set for the

content validity of a test when 50% of the incumbents agreed on all the questions in the

content validity survey. The preferred standard was set at 70%.

Each test passed all the minimum content validity standards. Therefore, each of the three

tests was content valid. In addition, with the exception of word-processing being learned in

a brief orientation, every test passed the preferred

standards for content validity, as can be

seen in the chart below:

(Note: the following cutoffs were agreed to by at least 70% of the subject-matter experts.)

Keyboarding Test Error Count Scale 5.00

Keyboarding Test Speed Words Per Minute Scale 55.00

Keyboarding Test Gross Key Strokes Scale 1443.00

Editing/Formatting from Rough Draft

Test

Total Score Scale 13.00

Language Arts 1 Test Capitalization Scale .60

Language Arts 1 Test Possessives Scale .50

Language Arts 1 Test Number Usage Scale .50

Language Arts 1 Test Abbreviations Scale .50

Language Arts 1 Test Punctuation Scale .50

Language Arts 1 Test Spelling Scale .50

Language Arts 1 Test Grammar Scale .70

Language Arts 1 Test Total Score Scale 50.00

Concurrent Criterion-Related Validity Results

If a test results in adverse impact against a protected group, the Uniform

Guidelines specifically allow concurrent criterion-related validity as a method of showing

business necessity for the test. (See Uniform Guidelines Sections II, 5A, and 14B(4).

According to the Uniform Guidelines, criterion-related validity is defined as follows:

Demonstrated by empirical data showing that the selection procedure is predictive of

or significantly correlated with important elements of work behavior. (Guidelines

Section 16F.)

Concurrent

criterion-related validity usually uses current employees as the sample,

obtaining test scores and criteria data (e.g., supervisory ratings) during relatively the same

time period. Predictive

criterion-related validity often uses applicants as the sample,

obtaining test scores during one period of time, then waiting to gather criteria data later.

This study used concurrent

criterion-related validity.

Directional hypotheses were set for each scale of the experimental tests. Statistical

significance was set at the one-tailed .05 level, specifying the direction of the relationship.

Correlations were calculated only for each of the hypothesized relationships using the

Pearson Product Moment formula. Restriction in range was a concern, as all the incumbents

had passed the employer's 5 minute timed test in order to obtain their jobs initially. It was

suspected that many of the tests in the experimental test battery would correlate with the

local 5-minute timed test. However, because the Equal Employment Opportunity field has

not established a clear rule allowing for correcting correlations not quite significant into

significance, no corrections were made for any possible indirect restriction in range. All

correlations presented are uncorrected.

Criterion-Related Validity Correlations

Below are the correlations calculated between the OPAC tests and supervisory ratings of job

performance hypothesized as having a possible relationship. Correlations are index numbers

that show the degree of relationship between the test score and supervisory ratings.

Correlations range from 1.0, showing a perfect relationship, to 0.0, showing absolutely no

relationship. A -1.0 means a perfect inverse relationship--as one score goes up, the other

score goes down. Correlations shown above the horizontal lines on the following

charts are statistically significant correlations. This means that the degree of

relationship is so strong that the relationship is unlikely to be due to chance and chance

alone except maybe 5% of the time or less. Statistically significant correlations were found

between the OPAC tests and the Accuracy Rating, Speed Rating, ratings of Level I of Word-

processing Skill, and ratings of Level II Word-processing Skill. Very few ratings were

obtained on Level III ratings of Word-processing Skill. Each of the Language Arts Test scales

statistically significantly correlated with the Accuracy Rating independently, except for the

Possessives Scale. The Possessives Scale was close to statistical significance (.259 obtained

and .268 needed). The “n” shown below refers to the number of incumbents who took the

tests and

had supervisory ratings used in the correlation calculations.

Correlations Between OPAC Test Scales and Supervisory

Ratings

Language Arts 1 Test (n=39):

0.29

0.40

0.42

0.38

0.26

0.44

0.55

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Correlation

Language Arts Test Scale

Language Arts Test Scale Validity

.268 needed for validity

Editing/Formatting from Rough Draft: (Level I n=66; Level II n=54)

0.25 0.25

0.05

0.1

0.15

0.2

0.25

Correlation

Level I Level II

Word Processing Level

Wordprocessing Levels I and II

.210 needed for validity (Level 1)

.230 needed for validity (Level 2)

0.05

0.1

0.15

0.2

0.25

0.3

Correlation

Keystrokes Words Per Minute

Keyboarding Test

.268 needed

for Validity

.252 needed

for Validity

Keyboarding (Keystrokes n=59; Words Per Minute n=62)

Alternative Procedure Analysis

The Uniform Guidelines state that:

Where two or more selection procedures are available which serve the user's

legitimate interest in efficient and trustworthy workmanship, and which are

substantially equally valid for a given purpose, the user should use the procedure

which has been demonstrated to have the lesser adverse impact. (See Uniform

Guidelines Section B.)

Three other test batteries were included in the study. Therefore, data was available to

evaluate "substantially equally valid" and the relative adverse impact of the four test

batteries.

Substantially equally valid can be evaluated using content validity and concurrent criterion-

related validity. Using content validity, the four test batteries included in this study had

substantially equally valid tests with the exception of one test battery's word-processing

test for WordPerfect. Using concurrent criterion-related validity, several of the test

batteries were close. Many of the key correlations were compared and found to be not

significantly different. However, the OPAC test battery was the only test battery to have

statistically significant correlations to all four ratings: Speed Rating, Accuracy Rating, Level

I of Word-processing Skill Rating, and Level II of Word-processing Skill Rating. Therefore,

under concurrent criterion-related validity, no other test battery was substantially

equally valid to the OPAC test battery.

An analysis of adverse impact was nevertheless conducted. Since about half or more of the

participants in the samples for the tests were Hispanic, adverse impact analyses were

feasible.

The Uniform Guidelines requires an analysis of both statistical significance and practical

significance in determining adverse impact. (See Uniform Guidelines Section 4D.) For

speed purposes, Cochran's correction to the chi-square was used to best approximate

statistical significance at the .05 level. (See Haber 1980.) This is a two-sample

hypergeometric test. Practical significance needs to be addressed to complete the

evaluation of adverse impact. (See Uniform Guidelines Section 4D and Baldus 1980).

Practical significance with rate differences involves at least three calculations. Each of these

calculations involves the effects of small number changes on other statistics. How many

more people need to be added to the disadvantaged group's passing number to: (1) change

the statistical significance conclusion, (2) change the 80 Percent Rule of Thumb conclusion,

or (3) change the selection rates themselves from being different to being the same or very

close to being the same. When 2 or fewer people added to the disadvantaged group can

alter the statistical conclusion, the results were found to be not practically significant. (See

Waisome 1991.) When 3 or fewer people added to the disadvantaged group alters the 80

Percent Rule of Thumb conclusion or adding 4 or fewer people brings the selection rates to

being very close to one another (within 2.1%), then the results were found to be not

practically significant. (See Contreras 1981.) Both of these court case citations are Federal

circuit court decisions. (For a more detailed discussion of adverse impact see reference:

Biddle 1992.)

Using the statistical significance and practical significance rules described above, adverse

impact was found for another test battery's spelling scale and OPAC's Keyboarding test for

words per minute and keystrokes scales. (Since the time of the study, OPAC's Keyboarding

test format has been changed. The new format preserves the test that showed the content

and criterion-related validity, but now allows the candidate to take the test in a scrolling

mode or from hard copy text.)

Overall Conclusions Considering Validity and Adverse

Impact

Using criterion-related validity as the standard for "substantially equally valid for a given

purpose" for the Section b analysis described in this paper, the OPAC System was the only

test battery (in the experimental test batteries) with tests that correlated statistically

significantly to all four

of the employer's criteria (Speed Ratings, Accuracy Ratings, Level I

of Word-processing Skill, and Level II of Word-processing Skill). Since the OPAC test

battery was the only

test battery that correlated statistically significantly with all four

criteria, the other four test batteries cannot be considered "substantially equally valid for a

given purpose." The OPAC test battery was able to correlate "above chance levels" to

criteria the other tests did not in this situation.

Content Validity Report for OPAC

Module Four

(March 1997)

OPAC Validity Report: Module 4

Test Description

Biddle Consulting Group, Inc. recently developed a fourth module for the Office Proficiency

Assessment and Certification

(OPAC

) System. This module was entitled “10-Key/Data

Entry.”

In addition to a 10-Key Test, three different tests were included for the evaluation of data

entry skills: the Vendor Test, the Inventory Test, and the Invoice Test. Three data entry

test are included within this module to allow an employer to choose the tests that are most

appropriate for the job in question. In order to ensure the closest match between the job

content and the test materials it is recommended (within Biddle Consulting Group’s OPAC

manual) that employers evaluate the content

of the tests, the format of the tests, and the

percentage of alpha and numeric keystrokes

within each document. The goal is for

employers to utilize the tests which are closest in content and format to the types of data

prospective employees will be expected to enter on the job. Although these three types of

tests cannot possibly replicate all types of materials that applicants might come into contact

with on the job, they are designed to simulate the format of the most commonly utilized

data entry designs. Consequently, an applicant’s performance on these tests will enable an

employer to evaluate the individuals general data entry capability.

Biddle Consulting Group recommended that employers evaluate each test before deciding

which ones would best serve the their business needs and be the most job-related.

The following information should aid employers in their decision of which test(s) are most

appropriate for the job classification under consideration. The tests were presented (as

they are within the program) by level of difficulty with the last test having the

highest difficulty level.

Vendor Test

These test forms are designed to simulate typical vendor entry sheets. The content of the

sheets includes a vendor number, company name, company address, and contact

information. These tests are the least difficult of the three data entry tests due to their high

percentage of alpha strokes and the general field separation of alphabetic and numeric

content (i.e. most fields are either

alpha or numeric, except for the address field). The

average breakdown of each Vendor test includes 75% alphabetic entry and 25% numeric

entry. If individuals hired will be expected to enter fields of information similar to those

included on these test forms, these data entry tests may be appropriate to use in a

selection process.

Inventory Test

These test forms are designed to simulate typical inventory sheets. The content of the

sheets includes item information and vendor number. These tests are more difficult than the

Vendor Tests for several reasons. First, the Inventory Tests have a higher numeric content.

Second, several fields are not stroke specific. That is, they are a mix of both alphabetic and

numeric key strokes, causing more transitions between these two areas on the keyboard.

The average breakdown of each Inventory test includes 64% alpha entry and 36% numeric

entry. If individuals hired will be expected to enter fields of information similar to those

included on these test forms, these data entry tests may be appropriate to use in your

selection process.

Invoice Test

These test forms are designed to simulate typical invoice sheets. The content of the sheets

includes order number, representative number, date, destination information, and product

information. These tests are the most difficult of the data entry tests for two main reasons.

First, these tests utilize the highest percentage of numeric keystrokes. Second, although the

fields within these tests are generally separated as to alpha and numeric content (most

fields are either

alpha or numeric, except for the address field), these test forms contain the

largest number of data fields. The average breakdown of each Invoice test includes 37%

alpha entry and 63% numeric entry. If individuals hired will be expected to enter fields of

information similar to those included on the test forms, these data entry tests may be

appropriate to use in your selection process.

If you are hiring for a position that embodies different types of data entry, several

tests can be given to a candidate to obtain all relevant skill information. It is

important to remember, however, that each test varies in content and difficulty

level. Candidates will not score the same on each test! It is this variance in difficulty

level that led to different certification standards for each test form (see below). The

more difficult test will result in a lower SPH score.

Test % Alpha % Numeric Certification Level*

10-Key 0% 100% 8000 SPH – 95% Accuracy

Vendor 75% 25% 6200 SPH – 95% Accuracy

Inventory 66% 34% 5600 SPH – 95% Accuracy

Invoice 37% 63% 5200 SPH – 95% Accuracy

* These certification levels are not based on national norms. These are preliminary standards, which will be re-

evaluated upon further study.

Review by Biddle & Associates, Inc./Biddle Consulting

Group, Inc.

Module 4 was designed with three test versions within each component. For example, the

10-Key component has a Test Version 1, Test Version 2, and Test Version 3. Each test

within module four was reviewed by 13 permanent employees of Biddle & Associates, and

nine temporary employees (a total of 22 initial in-house reviews). All 22 individuals

evaluated the 12 tests included in Module 4. This initial review included an analysis of the

instruction screens, ease of understanding and use of the tests, the testing documents, and

the candidate manual. Based on comments from the in-house reviews, improvements and

modifications were made to all aspect of the Module 4 program and test forms.

Additional modifications were made to the testing documents after difficulty analyses were

performed on all testing materials. Difficulty of the materials was determined by the

alphabetic/numeric ratio within and between documents. Alphabetic and numeric characters

were described as follows:

Alphabetic Characters: For the purposes of the difficulty calculations, an

alphabetic character included any character that was not a number. Therefore,

alphabetic characters included letters, blank spaces, and symbols (such as &, $,

etc.). Punctuation marks were also considered alphabetic due to the fact that they

are incorporated within the alphabetic keys and they are no more difficult to type (on

average) than the letters on the keyboard. Symbols were included within this

category due to the fact that there were so few symbols on the testing documents

that they did not merit their own category.

Numeric Characters: For the purposes of the difficulty calculations, a numeric

character included any number. Decimal points were not counted

when they were

part of a monetary figure (e.g., 19.95, 29.95).

The symbols that were hard-coded on the screen were not counted within the

documents, i.e., they did not contribute to the total key stroke calculations. These

included the hyphens (-) within the phone number field (555-555-5555) and the

back slashes (/) in the date field (09/09/99).

All tests within each component of Module 4 were evaluated. Each test within a component

(e.g., Data Entry 1: Vendor) was modified to ensure that they contained approximately the

same percentages of alphabetic and numeric characters. The tests within each component

are not statistically different in regards to the alphabetic/numeric ratio. In addition, each

test was divided into four quadrants, with the alphabetic/numeric ratio of each quadrant

compared to ensure that the difficulty level between different sections of the same

test were

not statistically different.

Review by Subject-Matter Experts

After the in-house (or alpha) review of Module 4, Biddle & Associates conducted an

evaluation by individuals outside the company (a beta review). This group included 73

individuals from MTI Business College and Heald Business College in Sacramento, California.

Based on comments from the beta review, a number of improvements and modifications

were made to the testing program and documents.

Subject-matter experts included individuals from a variety of ethnicities. Participants were

predominately female, as data-entry positions generally have an over-utilization of females.

Development of Certification Levels

Data from the subject-matter experts from MTI and Heald were also utilized to develop

recommended certification levels for the four new sets of tests within Module 4.

The certification levels for the Data Entry tests were developed with a test/test correlation

model which utilized a keyboarding score of 45 wpm to predict Data Entry test scores (see

Table 1 on the following page). The certification levels for each Data Entry test are different

due to the varying alphabetic/numeric content, field lengths, and number of fields per

record. Since both speed and accuracy are critical to employers for data entry applications,

both were included as certification criteria.

The certification standards for the 10-Key tests were developed by analyzing industry

standards for jobs requiring some level of 10-key data entry (many employers require at

least 10,000 SPH, but 8,000 SPH is widely accepted as a minimum level for 10-Key speed)

and an analysis of beta test scores.

Recommended Certification Levels for Three Data Entry

Tests - Biddle & Associates, Inc./Biddle Consulting Group,

Inc.

Data Entry 1 – Vendor

WPM Predicted Score Lowest Expected Score Highest Expected

Score*

40 5611 3265 7957

45 6175 3829 8521

50 6739 4393 9085

55 7304 4957 9649

60 7867 5521 10213

Pearson R = 0.69

Average Errors = 16

Recommended Certification Level: 6200 SPH and 95% Accuracy Rate

Data Entry 2 – Inventory

WPM Predicted Score Lowest Expected Score Highest Expected

Score*

40 5245 3460 7029

45 5610 3826 7395

50 5975 4191 7760

55 6341 4556 8125

60 6706 4922 8491

Pearson R = 0.64

Average Errors = 15

Recommended Certification Level: 5600 SPH and 95% Accuracy Rate

Data Entry 3 – Invoice

WPM Predicted Score Lowest Expected Score Highest Expected Score

40 4701 1774 7627

45 5222 2296 8149

50 5744 2817 8670

55 6265 3339 9192

60 6787 3860 9713

Pearson R = 0.57

Average Errors = 12

Recommended Certification Level: 5200 SPH and 95% Accuracy Rate

Scores based on a 95% Confidence Interval.

Accuracy and Completeness

After all modifications were made, all testing documents were entered into the program and

checked for 100% accuracy to the key by at least two individuals.

Validation Report for the Medical and Legal

Terminology Tests (August 1997)

THE OPAC

SYSTEM version 5.0, Module 5

Introduction

This report contains information regarding the development and content validation of the

medical and legal terminology tests within the OPAC

System. The medical and legal

terminology tests can be used for the employment, education, or certification of medical

assistants and legal assistants or legal secretaries.

Both the medical and legal terminology tests were designed based on content validity

standards outlined by the Uniform Guidelines on Employee Selection Procedures, section

14(C).

The Uniform Guidelines on Employee Selection Procedures provide a single set of

principles designed to assist employers, labor organizations, employment agencies, licensing

or certification boards comply with Federal law prohibiting employment practices which

discriminate on grounds of race, color, religion, sex, and national origin. These guidelines

are a framework for the proper use of tests and other selection procedures.

This report is structured according to the following sub-topics on reporting content

validation studies stipulated in section 15(C) of the Uniform Guidelines on Employee

Selection Procedures:

1. User(s), locations(s), and date(s) of the study

2. Problem and setting

3. Job analysis

4. Selection procedure and its content

5. Relationship between the selection procedure and the job

6. Alternative procedures investigated

7. Uses and applications

8. Contact person

9. Accuracy and completeness

Although separate validation studies were conducted for the medical and legal terminology

tests, these studies will be referred to as one in this report for efficiency and uniformity.

User(s), Locations(s), and Date(s) of the Study

The validation study for the medical and legal terminology tests was completed in July

1997, at Biddle & Associates, Inc./Biddle Consulting Group, Inc., located in Sacramento,

California. Medical Assistants were selected for participation in this study from one of the

nation’s largest Health Maintenance Organizations located at one of its sites in Sacramento,

California. Legal Assistants and Legal Secretaries were selected for participation in this

study from four large, full service law firms also located in Sacramento, California.

Problem and Setting

The purpose of this validation study was to determine if the knowledge-based terminology

tests are a representative sample of the body of learned information that is used, and is a

The Uniform Guidelines on Employee Selection Procedures were adopted in 1978

by the Equal Employment Opportunity Commission, Civil Service Commission, Department

of Labor, and the Department of Justice.

necessary prerequisite for the successful job performance of an Entry-level medical

assistant and entry-level legal assistant/secretary.

This validation study is predicated upon several important factors. Two industry experts

from the medical assistant profession and two from the legal assistant/secretarial field were

selected to provide terms and write test items for the medical and legal terminology tests,

respectively. (See Appendix for these experts qualifications). These experts were given

content validity-based criteria for writing test items. See Appendix for Industry Expert

criteria for selection of terms and writing of test items)

The industry experts from the medical assistant field provided two separate lists consisting

of two hundred (200) terms each for a combined total of four hundred (400) terms. Both

lists were compared and discussed by the experts. Terms that were on both experts’ list

were automatically selected for test item writing. Terms that were not on both lists were

discussed by the experts and either discarded or selected for item writing. This process

resulted in the selection of two hundred (200) medical terms test item construction. A

preliminary medical terminology test was designed consisting of the selected terms.

The same process as stated above went into the design of the preliminary legal terminology

test. The only difference is that one hundred and fifty terms (150) were selected for the

preliminary legal terminology test design. Thus, the preliminary test for legal terminology

contains one-hundred (150) test items.

Thirty-nine (39) Medical Assistants and twenty-five (25) Legal Assistants/Secretaries were

selected as Subject Matter Experts. All Subject Matter Experts were required to currently

hold the job title for the target positions and have at least one-year experience. A majority

of the Subject Matter Experts selected had several years’ experience individually in the

above classifications.

The thirty-nine (39) Medical Assistants and twenty-five (25) Legal Assistants and Legal

Secretaries took the preliminary medical and legal terminology tests, respectively.

The tests and surveys were collected and processed. The tests along with Scantron answer

sheets were loaded into the Test, Scoring & Analysis System program which is a testing

software package developed by Biddle Consulting Group, Inc.

After this process was

complete, a minimum cutoff score (pass/fail score) was set based upon calculations using

the modified Angoff method.

Survey forms were also designed to assess the content validity of each item on both the

medical and legal terminology tests. The survey forms are titled Test Response Survey.

Twenty-two (22) of the Medical Assistant Subject Matter Experts and (25) Legal Assistants

and Legal Secretaries completed the Test Response Survey forms.

Test Scoring & Analysis System is a comprehensive and proven tool for

developing, administering, scoring, and tracking objectively scored tests. This system also

has programs that provide data on test-item analysis, test distribution results, and

statistical cutoff-score analysis .

The modified Angoff method involves the setting of a job-related minimum cutoff

score for test that has been approved by the United States Supreme Court in the case U.S.

v. South Carolina, 15 EPD 7, 920, 445 F. Supp 1094 (DC S.Ct.1977) and 15 EPD 8027434

US 1026 (1978).

Results from the surveys were used to conduct validation analysis. Several test items were

eliminated during this process. After the validation analysis, the medical and legal

terminology tests contained one hundred and sixty-two (162) and seventy-five (75) test

items, respectively.

There are limiting factors in the size and scope of this validation study that may affect the

validity of these tests for general use. The test items were constructed based on the

opinions and experience of industry experts from one city. The subject matter experts who

took the test and responded to the survey forms were also selected from one city. There

were no studies conducted using a control group to show that the test distinguishes

statistically between candidates who have the prerequisite knowledge to perform the task

associated with the specified positions and those who do not.

In addition, there were no validity analyses for the rank ordering of test scores above the

minimum cutoff scores. Therefore, the recommended minimum cutoff score for each test is

valid only for pass/fail purposes. In other words, the test distinguishes only between

passing or failing scores and does not provide a basis for ranking scores above the cutoff

score.

Given these limiting factors, Biddle Consulting Group, Inc., recommends that users convene

a group of their own subject-matter experts to determine if the test is valid for their

purpose and specific job classifications. The OPAC System has a validation module that is

designed for users to conduct their own validity study.

Job Analysis

Knowledge of medical or legal terminology was deemed a necessary prerequisite for the

performance of medical assistant or legal assistant/secretary classifications based on job

descriptions industry experts’ opinions, and surveys completed by subject-matter experts.

Thus, the focus of this study centers on the identification of the specific knowledge that is

used and a necessary prerequisite for the work behaviors of medical assistants and legal

assistants and or legal secretaries.

Knowledge of medical terminology has a direct relationship to the work behavior of a

Medical Assistant because it is important and necessary for communication, record

maintenance, and treatment of patients. Knowledge of legal terminology is related to the

work behavior of Legal Assistant/Secretaries because it is important and necessary for

communication, research, and preparation of legal documents.

Moreover, the focus of this validation study is based on the analysis of medical and legal

terminology test items that meet the standards of content validity for knowledge-based

selection procedures outlined by the Uniform Guidelines on Employee Selection Procedures,

section 14(C) 4, which hold:

For any [test] measuring a knowledge...the user should show that (a) the [test] measures

and is a representative sample of that knowledge...and (b) that knowledge...is used in and

is a necessary prerequisite to performance of critical or important work behavior(s).

Subject-matter experts for the chosen classifications were given criteria to analyze each test

item using the Test Survey Response form which address the above guidelines. This survey

was constructed based on models presented in the Guidelines Oriented Job Analysis (GOJA)

offered by Biddle (1996). The GOJA

method has been supported in numerous court cases

for a variety of jobs. Essential components of the survey used by the subject-matter

experts follow:

Categories Ratings with Explanations

Correct Ans. Write “yes” or “no” to indicate whether the answer provide is

correct.

Frequency Rating Write the letter(s) that indicate the frequency (how often) the term

is used on the job (e.g. in correspondence, reading material,

conversation).

D = Daily

W = Weekly

BW = Bi-Weekly

M= Monthly

BM = Bi-Monthly (every two months)

Q= Quarterly

SA= Semi-Annually

A= Annual

LA= Less often than once a year

Importance Provide one of the following ratings to indicate how important

knowledge of the term is to the job.

1. NOT IMPORTANT: Trivial or minor significance to the

performance of the job.

2. SOMEWHAT IMPORTANT: Somewhat helpful, useful, and /or

meaningful to performance of the job.

3. IMPORTANT: Helpful, useful, and/or meaningful to the

performance of the job.

4. CRITICAL: Necessary for the performance of the job.

5. EXTREMELY CRITICAL: Necessary for the performance of the

job, but with more extreme consequences

% Of Qualified

Apps.

To your best estimate, indicate the percentage of minimally qualified

applicants that would be expected to answer the particular question

correctly.

When Required Write the letter that indicates when knowledge of this term must be

known.

A. Required at time of hire

B. Learned on the job

The relationship between the terminology tests and the target positions was established

based on the averages of Importance, Frequency, and When Required ratings that all

subject-matter experts provided for each test item.

Item difficulty ratings were obtained for each test question along with an overall item

difficulty rating for both the medical and legal terminology tests. These ratings were

obtained from the Test Scoring & Analysis System software. Item difficulty shows the

proportion of subject-matter experts who answered an item correctly.

Selection Procedure and Contents

There are two medical terminology tests consisting of eighty (80) multiple choice test items

each, labeled Medical Test Form “A” and Medical Test Form “B.” There is one legal

terminology test consisting of seventy-five (75) multiple choice test items.

The above tests are part of the 1997 release of the OPAC

System version 5.0. This

version of the OPAC System is commercially available and distributed by Biddle Consulting

Group, Inc.

Industry experts along with the Product Development Analyst wrote the test items

according to item writing criteria. The criteria for Test Item Writing were composed by the

Product Development Analyst. These criteria are based on data for writing test items

provided by Biddle Consulting Group, Inc., and principles for constructing test items offered

by Osterlind (1989).

As stated above, one-hundred-and-sixty-two (162) medical terminology test items were

selected for the final item bank. Selection of these items was based on validation criteria

applied to averaged results obtained from the Test Survey Response forms. Fifty percent

(50%) or more of the Subject-matter experts assigned an importance rating of “3” or

greater to each one of the 162 test items. This means that knowledge of each of the 162

medical test items is considered either important, critical, or extremely critical to the

performance of the Medical Assistant classification by at least fifty percent,

of the Subject-

matter experts surveyed.

Two parallel test forms were created from the 162 test items which represent Medical

Terminology Test Form “A” and Medical Terminology Test Form “B.” Both test forms contain

eighty (80) test items each. These test forms have the same type of material, with the

same level of difficulty, but different test items.

Medical terminology tests forms “A” and “B” each have been determined to measure and

represent a sample of the knowledge of medical terminology that is used and is a necessary

prerequisite in the job performance of the Entry-level medical assistant classification. This

determination is based on (a) the results of the validation study involving the analysis of

responses to twenty (22) medical assistant Test Survey Response forms and (b) analysis of

the test distribution results.

The Legal Terminology test also has been determined to measure and represent a sample of

the knowledge of legal terminology that is used and a necessary prerequisite to the job

performance of an Entry-level legal assistant and or Legal Secretary. This determination is

also based on (a) the results of the validation study involving the analysis of responses of

twenty-five (25) Legal Assistants and Legal Secretaries Test Survey Response forms and (b)

analysis of the test distribution results.

Seventy-five (75) legal terminology test items were selected for the final item bank.

Selection of these items is based on validation criteria applied to averaged results obtained

from the Test Survey Response forms. Fifty percent (50%) or more of the Subject-Matter

Experts assigned an importance rating of “3” or greater to each one of the 75 test items

(see Importance ratings above). This means that knowledge of each of the 75 test items is

considered either important, critical, or extremely critical to the performance of the Legal

The standard that at least 50 percent of the Subject-Matter Experts need to agree

on issues that determine inclusion of an item on a test was approved by the U.S. Supreme

Court in the court case U.S. v. South Carolina, 434 US 1026 (1989).

Assistant and or Legal Secretary classification by at least fifty percent of the Subject-matter

experts surveyed.

Relationship between the Selection Procedure and the Job

The evidence demonstrating that the medical and legal terminology test items are a

representative sample of the knowledge used as a part of the work behavior of medical

assistants and legal assistants/secretaries was obtained from information on the Test

Survey Response reported by subject-matter experts.

Entries from the surveys where compiled into two reports--medical and legal--using a

spreadsheet program. The report from the Medical Assistant subject-matter experts

surveyed has thirty-one (31) pages and contains twenty thousand (22,000) entries. This

report will be referred as the Medical Survey Report. The report from the Legal subject

matter experts has twenty (20) pages and contains eighteen thousand plus (18,000+)

entries. This report will be referred to as the Legal Survey Report.

Both reports were then imported into a database program, and subset reports were then

created from them. The subset reports provide average ratings for each test item which is

calculated by category--Correct Ans, Frequency, Importance, and When Required.

The subset reports were used to conduct validation analysis of each test-item Test-items

were selected or deselected for the final test item bank using the following criteria (which

will be referred to as Test-Item Validation Criteria):

1. At least 50 percent of the Subject Matter Experts surveyed agree that the

knowledge of the specific test item is required at the time of hire.

2. At least 50 percent of the Subject Matter Experts surveyed rated that knowledge of

the specific test item “3” or greater in Importance.

AND

3. At least 50 percent of the Subject Matter Experts surveyed indicated that the term is

used annually or more frequently on the job.

As noted above, the United States Supreme Court approved that a fifty percent agreement

among subject-matter experts is an acceptable standard for the inclusion of an item on a

test (U.S. v South Carolina,

1978).

Micro-reports for both the medical and legal tests were created from the subset reports to

show the specific criteria that each selected test item meets.

The two parallel medical test forms that were created from the 162 test items selected are

presented in. Again, both test forms contain eighty (80) test items each. These test forms

have the same type of material, with the same level of difficulty, but different test items.

Results of the validation study indicate that Medical tests forms “A” and “B” each measures

and represents a sample of the knowledge of medical terminology that is used and is a

necessary prerequisite in communication, record keeping, and treatment of patients to

perform the job of an Entry-level Medical Assistant.

Measures of central tendency, standard deviation, and estimates of reliability were

computed using the Test Scoring & Analysis System software for Medical Terminology Tests

forms “A”& “B.” The following test results are based on the Subject Matter Experts’ test

scores:

Test Form A

Number of Items = 80

Number of Subjects = 39

Test Mean = 65.85

Standard Deviation = 8.639

Test Reliability = .8837

Average Test Difficulty = .8230

Medical Test Form B

Number of Items = 80

Number of Subjects = 39

Test Mean = 62.26

Standard Deviation = 9.901

Test Reliability = .8958

Average Test Difficulty = .7782

Results of the validation study indicate that the legal test form measures and represents a

sample of the knowledge of legal terminology that is used and is a necessary prerequisite in

communication, record keeping, and preparation of legal documents to perform for the job

of a Legal Assistant or Legal Secretary.

Measures of central tendency, standard deviation, and estimates of reliability were also

computed for the legal terminology test. Some of these results follow:

Legal Form

Number of Items = 75

Number of Subjects =

Test Mean = 64.52

Standard Deviation = 8.080

Test Reliability = .9025

Average Test Difficulty = .8602

Alternative procedures investigated

No alternative test or selection procedure was investigated for this study. Nor were adverse

impact analyses conducted. Nevertheless, content validity has been demonstrated for all

the tests that justify their use on the grounds of business necessity. The Uniform

Guidelines, section II, specifically allow content validity as a method of showing business

necessity for the use of a selection procedure (test).

Uses and applications

The medical and legal terminology tests are intended for use in employment, training,

education, certification, or other related purposes. As indicated above, these knowledge-

based tests have been shown to represent a sample of the knowledge that is used and is a

necessary prerequisite for the successful job performance of an Entry-level Medical

Assistant and a Legal Assistant and or Legal Secretary.

These tests were designed to be used primarily as a screening device for hiring, training,

education, licensing, certification, or other related purposes. The test scores should be used

on a pass/fail basis only. The following cutoff scores (pass or fail scores) are recommended:

Test Pass score

Medical Terminology Form “A” 55

Medical Terminology Form “B” 53

Legal Terminology 50

Each test item on all tests is weight one (1.0) or worth one point. A score of fifty-five (55)

of a total possible score of eighty (80) is the minimum passing score recommended for

Medical Terminology Form “A”. A score of fifty-three (53) of a total possible score of eighty

(80) is recommended for Form “B”. Similarly, the minimum passing score recommended for

the Legal Terminology Test is fifty (50).

The purpose for setting these cutoff scores is to distinguish between candidates who have

demonstrable knowledge of medical or legal terminology that is used and is a necessary

prerequisite for successful job performance (for the jobs stated above) and those who do

not have this knowledge.

The above cutoff scores were derived from a job-related cutoff setting process called the

modified Angoff method. This method involves establishing an overall average level of

minimum proficiency using several subject matter experts and then lowering the average

rating by one standard error of measurement.

The United States Supreme Court has

accepted the modified Angoff method for setting job-related cutoff scores for tests (U.S. v.

South Carolina, 1978).

All subjects matter experts were required to provide a percentage rating via the Test Survey

Response form for each test item based on their opinion of minimum qualified applicants

that would be expected to answer the question correctly level. This category is termed

Percentage of Qualified Applicants and appears under column four on the Test Survey

Response form. Averages were calculated per test item. These averages were loaded into

the Job Related Cutoff program of TSA software system. This program calculated an overall

average percentage using the averaged score per test item for the tests. The overall

average score was then lowered by one standard measure of error (modified Angoff

method). The standard measure of error was calculated by TSA and appears as part of the

Test Distribution Results. This process resulted in the setting of the cutoff scores, listed

above, for each test.

Contact person

The person who may be contacted for further information about this validity study is:

James Kuthy, M. A.

Senior Consultant

The standard error of measurement is designed for interpreting the reliability of

test scores. It is used to distinguish between test scores that are statistically different.

Biddle Consulting Group, Inc.

193 Blue Ravine Road, Suite 270

Folsom, CA 95630

Accuracy and completeness

To ensure accuracy and completeness all survey entries were checked and compared. Item

difficulty levels were compared to subject matter experts’ minimum qualified applicant

ratings. Wherever item difficulty rating was significantly lower than the subject-matter

experts’ expected proficiency rating, the subject matter experts’ rating was adjusted to

equal the item difficulty rating. This procedure prevents overestimation of ratings, which

avoids inflated cutoff scores. The Correct Answer columns were checked for all survey

responses. Any test item that did not receive a rating of one hundred percent (100%)

agreement regarding its correctness was check thoroughly and adjusted where necessary.

Any test item indicating a negative correlations with the key was checked and adjusted (this

correlation was provided by Item Analysis program in TSA).

Development Report for

OPAC

System 5.0

Legal Keyboarding and Language Arts Tests

October 1998

Disclaimer

Though the research conducted for this report is thorough and complete, it should in no

way be construed as a final validation study. Rather, it is a good faith effort on the part

of Biddle Consulting Group, Inc., to demonstrate that the tests described in this report

have been pilot tested, and that they do provide a meaningful measurement of the

skill(s) being tested. Because this study was conducted at only one employer, its results

and applications may or may not be relevant in other geographical areas, employers,

specific areas of practice, or job positions. Biddle Consulting Group recommends

conducting an in-house validation study of all tests before using them as a selection

device, as such a study would help establish that the skills measured by the tests in this

report are essential to the specific job environment in which the in-house validation

study was conducted.

Abstract

Legal keyboarding and language arts tests were developed to aid in the selection of properly

qualified candidates in the legal assistant and legal secretary job classifications. Three

alternative versions of each test were developed. The legal keyboarding test was designed

to measure the speed and accuracy of applicants typing legal text. The legal language arts

test was designed to measure an applicant’s ability to proofread and spot various

grammatical errors in documents that legal assistants and secretaries would typically be

expected to analyze and proofread. Two legal industry experts assisted in the development

of both tests, and 39 subject-matter experts participated in the evaluation of the new tests.

One hundred percent of legal subject-matter experts who examined the keyboarding test

agreed that the test appropriately measured the skill being assessed, and 100% of subject-

matter experts who examined the language arts test also agreed that the test appropriately

measured the skills being assessed. Legal subject-matter experts were administered all

alternative forms of both tests, and their input established alternate form reliability

coefficients for each test. Cutoff scores were also derived, based upon the scores of job

incumbent subject-matter experts.

Background

The following is a report describing the development process of the OPAC System legal

keyboarding and language arts tests. The reason for developing these tests was twofold.

First, a product development decision had been made to orient the OPAC System towards

the legal industry, as there is a high need for clerical skills in this field and a perceived high

demand for skills testing in legal industry. Second, informal feedback from representatives

of the legal industry (solicited mainly from tradeshow conventions and telephone

interviews) suggested that legal keyboarding and language arts tests might be the most

needed tests for the industry, and thus the most likely tests to develop. Additionally, the

OPAC System already contained general versions of these tests, so there was both a

product history and test format from which to develop the new instruments.

Early Development

Although an informal perusal into the job occupations of legal assistant and secretary

revealed that both keyboarding and language arts skills were important to successful

performance in these job classifications, more quantifiable evidence needed to be obtained.

To that end, 241 law offices throughout the United States were contacted via facsimile and

asked to provide job descriptions for the positions of legal assistant and secretary. Out of

the 241 offices contacted, 11 provided complete job descriptions for these positions. All of

the received job descriptions indicated that at least some level of minimum competency in

the skills of keyboarding and language arts was needed for successful performance in these

job classifications. This information provided enough evidence to justify the full

development of selection tests measuring keyboarding and language arts skills. Appendix 1

contains all received legal assistant and secretary job descriptions.

Industry Experts

Industry experts were recruited to provide guidance and direction in the test development

process. Two industry experts participated in constructing the tests. All industry experts

were required to have at least five years of experience in a job classification at or above the

level of legal assistant or secretary (the qualifications of these experts are provided in

Appendix 2). It was the duty of the industry experts to first provide materials from which to

develop the tests, and then to provide feedback and advice on how to develop the tests.

Based on the material provided by industry experts, three alternate versions of each test

were developed. Once completed, the tests were shown to industry experts, who then

evaluated them as to their content and provided recommended changes. The tests were

revised and again presented to industry experts for final approval. Industry experts were

compensated for their participation in the test development process.

Test Descriptions

The legal keyboarding test was designed to measure typing speed and accuracy specific to

legal documents frequently typed by legal assistants and secretaries. Three alternate

versions of the test were constructed. Each version had between 640 and 692 words of text.

The text material was selected from actual documents that had been used in a several law

offices and the tests were similarly formatted to take into account form, content, and layout

of the presented text. All tests were constructed to have roughly the same overall level of

difficulty. To distinguish it from regular typing tests, the legal keyboarding test contains

frequently used legal terminology and other such legal-specific contents. Because of the

frequent technical and numeric information contained in the test, it was thought that skill

performance differences between the legal keyboarding test and a non-specific keyboarding

test might vary, with test takers performing better on a non-specific test (that does not

contain the highly technical information found in the legal keyboarding test). In its final

format, the legal keyboarding test will be presented to candidates either on a computer

screen, or on a hardcopy printout. Appendix 3 contains all three versions of the test.

The legal language arts tests were designed to measure grammar and proofreading skills.

As with the legal keyboarding test, three alternate versions of the test were constructed,

and these versions were constructed with the intention of being similar in both structure

and difficulty level. The test was designed to simulate actual legal documents, such as a

Request for Production or a will. A series of errors were imbedded in the text, the goal for

the test taker being to locate and correct these errors. The errors were divided into the

classifications of spelling, grammar, punctuation, number usage, possessives, and

capitalization. To successfully complete the test, candidates must not only identify the

errors (demonstrating proofreading skill), but also have the knowledge to correct the

uncovered errors. Each alternate version of the test had between 78 and 82 errors

imbedded in the text document, which was between 329 and 356 words long. Error-to-

total-word ratios ranged from .23 to .24, which is a similar level found in current OPAC

System language arts tests. Appendix 4 contains all three versions of this test.

Testing Site

After construction of the tests was complete, it became necessary to locate a suitable

testing site from which to pilot test the new instruments. For the legal keyboarding and

language arts tests, a large law office located in Menlo Park, California was selected as the

testing site for the new instruments. This test site offered a large pool of subject-matter

experts from which to draw, and it also provided subject-matter experts who had some

diversity in their particular area of law practice. Subject-matter experts from several fields

of law were able to participate in the study.

Method

Participants

Thirty-nine subject-matter experts took part in the beta testing of the legal keyboarding and

language arts tests (N

= 39). All subject-matter experts were either legal assistants or legal

secretaries (or of similar classification) and had at least one year of experience working in

that job occupation. The overall mean years of job experience for the subject-matter

experts was 9.81 (M

= 9.81, SD = 7.13). Subject-matter experts spent approximately one

hour taking and evaluating all three versions of both tests. Upon completing the test

evaluation, subject-matter experts were thanked for their participation and compensated for

their time with gift certificates from a local department store.

Materials

Legal Keyboarding and Language Arts Tests.

Final beta versions of the legal keyboarding and language arts tests were administered to

subject-matter experts. The tests were contained in a special beta version of OPAC 5.0 skills

testing software that had been installed onto six computers in the law office’s training room.

Candidates were able to open the program by selecting an icon located on the desktop of

the computer. Once opened, the computer automatically launched the tests, and candidates

completed all three versions of each test.

Validation Survey.

The validation survey was used to evaluate the quality and content validity of each test

being examined. The survey was constructed based on a validation report included in OPAC

5.0, and addresses the content validation requirements described in the Uniform Guidelines

(1978). Data on each topic was gathered in the survey:

• Whether or not the test measured the skill it was designed to measure

• Whether or not the skill being measured is required at job entry

• The importance of the skill

• The difficulty level of the test

• The subject-matter expert’s score on the test

• The subject-matter expert’s opinion as to what a minimally qualified candidate’s

score on the test should be to be considered for employment/promotion

The survey was also designed to capture subject-matter expert demographic information

such as name, gender, ethnicity, job title, and years of work experience. All versions of both

tests were examined separately, and subject-matter experts completed validation surveys

for all versions of each test.

Procedure

A training supervisor at the law office was placed in charge of the test site. Subject-matter

experts were tested in groups of five or six during their lunch hour. These testing sessions

were staggered out over a one-week period, so as to allow sufficient time for each subject-

matter expert to be able to participate. Subject-matter experts were seated at the computer

which had the beta version of the OPAC software installed. Once seated, subject-matter

experts were given the validation survey, which contained full instructions on how the

testing process was to proceed. In order to keep track of their scores on the computer,

subject-matter experts entered their social security number when prompted to do so by the

computer. The computer then administered each version of the both tests to subject-matter

experts, who had five minutes to complete each keyboarding test, and 13 minutes to

complete each language arts test. The order in which the tests were presented was

randomized so as to lessen any carry-over or practice effects. Between each test, the

computer was paused, allowing subject-matter experts to answer validation questions about

each of the tests in the survey.

After all six tests were completed, the subject-matter experts were asked to attest that they

gave each test their best effort, which they did by checking a box on the last page of the

survey that indicated as such. Subject-matter experts were thanked for their time and

escorted from the test site.

Results

In order to establish basic content validity for each test, at least 50% of subject-matter

experts must agree that proficiency in the skill which the test measures is essential for

successful performance of the job being selected for. One hundred percent of subject-

matter experts agreed that proficiency in language arts was essential to successful

performance in the job of legal assistant or secretary, and all agreed that keyboarding skills

were necessary for successful performance of the job of legal assistant or secretary.

It is also essential to demonstrate that a skill being tested for is required at the time of job

entry, and cannot be learned during a brief orientation. To that end, subject-matter experts

were asked whether or not keyboarding and language arts skills were required at time of

job entry of if they could be learned while on the job. Eighty-eight percent of subject-matter

experts agreed that language arts skills were essential at time of job entry, and 82%

percent agreed that keyboarding skills were essential at time of job entry.

Legal Keyboarding

Each alternate version of the legal keyboarding test was examined to determine mean

scores and difficulty levels for each. Mean scores and standard deviations for the legal

keyboarding test versions one, two, and three were highly comparable (M

= 63.16, SD =

14.79, M

= 64.08, SD = 15.42, M = 65.72, SD = 14.97), suggesting that the tests

contained similar content and had a similar level of difficulty. The overall mean standard

error of measurement was 5.81. In order to determine consistency between the different

versions of the test, an alternate form reliability analysis was conducted. The Pearson

product-moment correlation coefficient was used to determine the reliability of each version

of the test. From this analysis, the following matrix was developed.

Table 1: Product-moment correlations between each

version of the Legal Keyboarding Test.

Legal Keyboarding

Version One

Legal Keyboarding

Version Two

Legal Keyboarding

Version Three

Legal

Keyboarding

Version One

1.00 .930* .819*

Legal

Keyboarding

Version Two

.930* 1.00 .799*

Legal

Keyboarding

Version Three

.819* .799* 1.00

*Significant at the 0.01 level.

Based upon the correlations between each version of the test, an overall mean correlation

was determined, R (38) = .85, p < .01. This is a strong reliability coefficient, and indicates

consistency between different versions of the test. Subject-matter experts were also asked

to rate the difficulty level of the test. Using a simple Likert-type scale ranging from 1 to 3 (1

indicating that the test was too easy, 2 indicating that the test had the appropriate level of

difficulty, and 3, indicating that the test was too difficult) subject-matter experts rated the

overall difficulty of the test. Subject-matter experts rated the tests with a mean difficulty

level of M = 2.09, SD = 0.55, indicating that the tests are set at an appropriate level of

difficulty.

Legal Language Arts

As with the legal keyboarding tests, each alternate version of the legal language arts test

was examined to determine mean scores and difficulty levels. Mean scores and standard

deviations for the legal language arts test versions one, two, and three were consistent (M

= 63.77, SD

= 9.21, M = 60.24, SD = 11.19, M = 64.38, SD = 7.90), meaning that the

tests contained similar content and had a similar level of difficulty. Overall, the mean

standard error of measurement was 4.50. As with the legal keyboarding tests, a reliability

analysis was conducted. The Pearson product-moment correlation coefficient was again

used to determine the reliability of each version of the test. From this analysis, the following

matrix was constructed.

Table 2: Product-moment correlations between each

version of the Legal Language Arts Test.

Legal Language

Arts Version One

Legal Language

Arts Version Two

Legal Language

Arts Version Three

Legal Language

Arts Version One

1.00 .757* .855*

Legal Language

Arts Version Two

.757* 1.00 .736*

Legal Language

Arts Version

Three

.855* .736* 1.00

*Significant at the 0.01 level.

An overall mean correlation was determined, R (38) = .78, p < .01. This is an acceptable

reliability coefficient, and indicates consistency between different versions of the test.

Subject-matter experts were lastly asked to rate the difficulty level of the language arts

test. Using a simple Likert-type scale ranging from 1 to 3 (1 indicating that the test was too

easy, 2 indicating that the test had the appropriate level of difficulty, and 3, indicating that

the test was too difficult) subject-matter experts rated the overall difficulty of the test.

Subject-matter experts rated the tests with a mean difficulty level of M = 2.17, SD = 0.63,

indicating that the tests are set at an appropriate, if slightly high, level of difficulty.

Angoff Scores

To determine the appropriate cutoff score for each test, the modified Angoff method was

utilized. The United States Supreme Court (U.S. v. South Carolina) has upheld this method

of determining test cutoff scores (Biddle, 1993). Subject-matter experts were asked as to

what they believed the score on each test for a minimally qualified applicant should be,

which is designed to represent how a minimally qualified job applicant would perform on the

test. Subject-matter experts provided these Angoff scores for all versions of each test.

Angoff scores were then averaged across alternate versions of each test, yielding a mean

Angoff score of 54.20 for the legal keyboarding test, and 56.88 for the legal language arts

tests. Based on these Angoff scores, cutoff scores using each test’s standard error of

measurement could be derived. The cutoff score for each test was set at one standard error

of measurement unit below the test’s mean Angoff. This process led to the following

modified Angoff cutoff score for each test.

Table 3: Summary Statistics and Modified Angoff Cutoff

Scores for the Legal Keyboarding and Legal Language Arts

Tests.

Legal Keyboarding Legal Language Arts

Mean Angoff Score 54.20 56.88

Standard Deviation 14.97 9.66

R .85 .78

Mean Standard Error of

Measurement

5.81 4.50

Modified Angoff Cutoff Score 48 52

Appendix 5 contains full summary statistics for each test, as well as raw candidate scores

and feedback from each of the selection tests.

Performance Differentiation

Lastly, subject-matter experts were polled to determine how strongly they believed that

higher levels of mastery in the skill being assessed distinguished candidates with higher

levels of performance in a particular job duty from candidates with lower levels of

performance in this job duty. Using a Likert-type scale ranging from 1 to 4 (1 indicating

little or no performance differentiation, 2 indicating some performance differentiation, 3

indicating significant performance differentiation, and 4, indicating very significant

performance differentiation) subject-matter experts were asked to rate how performance

differentiating the skills being assessed by the new tests were. Subject-matter experts gave

the legal keyboarding test a mean performance differentiation rating of 2.59, and the legal

language arts test a mean performance differentiation rating of 2.75, suggesting that higher

levels of these skills may be performance differentiating.

Job Duty/KSA Linkage

The Uniform Guidelines (1978) require that tested knowledge, skills, and abilities (KSAs) be

linked to established job duties. Responses from subject-matter experts almost universally

agreed that keyboarding and language arts skills were essential components of major job

duties. Subject-matter experts were asked to list the two most important job duties that

link to the tested KSAs, and to rank the importance and frequency of each job duty. Job

duties such as “processing of documents,” “transcription,” and “drafting correspondence”

were linked to both keyboarding and language arts skills by subject-matter experts. See

Appendix 5 for full descriptions. On a Likert-type scale of 1 to 5 (1 being not important, 5

being extremely critical), subject-matter experts rated the overall importance of listed job

duties with a mean rating of M

= 3.68, indicating that the linked job duties were essential to

successful job performance. Subject-matter experts also assigned a frequency rating to the

listed job duties. Using a Likert-type scale of 1 to 5 (1 indicating daily to weekly

performance of the job duty, 5 indicating less than annual performance of the job duty).

Subject-matter experts’ mean frequency rating was M

= 1.11, indicating that the listed job

duties were frequently performed.

Discussion

The results of this development study indicate that the legal keyboarding and language arts

tests successfully measure the skills that they were designed to assess. Additionally, it

appears that the use of these tests is likely appropriate to the selection process of the legal

secretary and legal assistant job classifications. However, it is important to note that this

development report does not constitute a full content validation study. Such a study would

have to account for regional differences, differences in legal specialty, differences in job

positions, and differences in specific job work environment. All that can be extrapolated

from the present study is that the evaluated legal tests are appropriate to the selection

process for the law office in which the testing site was held. The Principles for the Validation

and Use of Personnel Selection Procedures (1987) state that full content validation

procedures should allow for test administrators to be able to generalize the content

validation results to different population samples, something that the current development

study does only if it is confirmed through a validation transportability process. Biddle

Consulting Group recommends that individuals wishing to uses these tests as a selection

device conduct an in-house content validation study. Such a study would ensure that the

selection process is fair and applicable to the job environment where the selection process

would take place. Coupled with the current development study, which demonstrates basic

ability of the instruments to measure the skills that they were designed to measure,

administrators of the legal keyboarding and language arts tests will aid many employers in

selecting applicants who possess the skill levels needed for acceptable job proficiency.

Development Report for OPAC

System 5.0

Medical Keyboarding and Language Arts Tests

October 1998

Disclaimer

Though the research conducted for this report is thorough and complete, it should in no

way be construed as a final validation study. Rather, it is a good faith effort on the part

of Biddle Consulting Group, Inc., to demonstrate that the tests described in this report

have been pilot tested, and that they do provide a meaningful measurement of the

skill(s) being tested. Because this study was conducted at only one employer, its

results and applications may or may not be relevant in other geographical areas,

employers, specific areas of practice, or job positions. Biddle Consulting Group

recommends conducting an in-house validation study of all tests before using them as a

selection device, as such a study would help establish that the skills measured by the

tests in this report are essential to the specific job environment in which the in-house

validation study was conducted.

Abstract

Medical keyboarding and language arts tests were developed to aid in the selection of

properly qualified candidates in the medical assistant and medical secretary job

classifications. Three alternative versions of each test were developed. The medical

keyboarding test was designed to measure the speed and accuracy of applicants typing

medical text. The medical language arts test was designed to measure an applicant’s ability

to proofread and spot various grammatical errors in documents that medical assistants and

secretaries would typically be expected to analyze and proofread. Three medical industry

experts assisted in the development of both tests, and over 20 subject-matter experts

participated in the evaluation of the new tests. Eighty-nine percent of medical subject-

matter experts who examined the keyboarding test agreed that the test appropriately

measured the skill being assessed, and 84% of subject-matter experts who examined the

language arts test also agreed that the test appropriately measured the skills being

assessed. Medical subject-matter experts were administered all alternative forms of both

tests, and their input established alternate form reliability coefficients for each test. Cutoff

scores were also derived, based upon the scores of job incumbent subject-matter experts.

Background

The following is a report describing the development process of the OPAC System medical

keyboarding and language arts tests. The reason for developing these tests was twofold.

First, a product development decision had been made to orient the OPAC System towards

the medical industry, as there is a high need for clerical skills in this field and a perceived

high demand for skills testing in medical industry. Second, informal feedback from

representatives of the medical industry (solicited mainly from tradeshow conventions and

telephone interviews) suggested that medical keyboarding and language arts tests might be

the most needed tests for the industry, and thus the most likely tests to develop.

Additionally, the OPAC System already contained general versions of these tests, so there

was both a product history and test format from which to develop the new instruments.

Early Development

Although an informal perusal into the job occupations of medical assistant and secretary

revealed that both keyboarding and language arts skills were important to successful

performance in these job classifications, more quantifiable evidence needed to be obtained.

To that end, 295 health organizations (hospitals, doctor’s offices, etc.) were contacted via

facsimile and asked to provide job descriptions for the positions of medical assistant and

secretary. Out of the 295 offices contacted, 12 provided complete job descriptions for these

positions. All of the received job descriptions indicated that at least some level of minimum

competency in the skills of keyboarding and language arts was needed for successful

performance in these job classifications. This information provided enough evidence to

justify the full development of selection tests measuring keyboarding and language arts

skills. Appendix 1 contains all received medical assistant and secretary job descriptions.

Industry Experts

Industry experts were recruited to provide guidance and direction in the test development

process. Three industry experts participated in constructing the tests. All industry experts

were required to have at least five years of experience in a job classification above the level

of medical assistant or secretary (the qualifications of these experts are provided in

Appendix 2). It was the duty of the industry experts to first provide materials from which to

develop the tests, and then to provide feedback and advice on how to develop the tests.

Based on the material provided by industry experts, three alternate versions of each test

were developed. Once completed, the tests were shown to industry experts, who then

evaluated them as to their content and provided recommended changes. The tests were

revised and again presented to industry experts for final approval. Industry experts were

compensated for their participation in the test development process.

Test Descriptions

The medical keyboarding test was designed to measure typing speed and accuracy specific

to medical documents frequently typed by medical assistants and secretaries. Three

alternate versions of the test were constructed. Each version had between 616 and 643

words of text. The text material was selected from actual documents that had been used in

a large, Northern California hospital, and the tests were similarly formatted to take into

account form, content, and layout of the presented text. All tests were constructed to have

roughly the same overall level of difficulty. To distinguish it from regular typing tests, the

medical keyboarding test contains frequently used medical terminology and other such

medical-specific contents (cc, b.i.d., Levothroid, etc). Because of the frequent technical and

numeric information contained in the test, it was thought that skill performance differences

between the medical keyboarding test and a non-specific keyboarding test might vary, with

test takers performing better on a non-specific test (that does not contain the highly

technical information found in the medical keyboarding test). In its final format, the medical

keyboarding test will be presented to candidates either on a computer screen, or on a

hardcopy printout. Appendix 3 contains all three versions of the test.

The medical language arts tests were designed to measure grammar and proofreading

skills. As with the medical keyboarding test, three alternate versions of the test were

constructed, and these versions were constructed with the intention of being similar in both

structure and difficulty level. The test was designed to simulate an actual medical

document, such as an insurance claim or doctor’s report. A series of errors were imbedded

in the text, the goal for the test taker being to locate and correct these errors. The errors

were divided into the classifications of spelling, grammar, punctuation, number usage,

possessives, and capitalization. To successfully complete the test, candidates must not only

identify the errors (demonstrating proofreading skill), but also have the knowledge to

correct the uncovered errors. Each alternate version of the test had between 78 and 82

errors imbedded in the text document, which was between 320 and 348 words long. Error-

to-total-word ratios ranged from .23 to .24, which is a similar level found in current OPAC

System language arts tests. Appendix 4 contains all three versions of this test.

Testing Site

After construction of the tests was complete, it became necessary to locate a suitable

testing site from which to pilot test the new instruments. For the medical keyboarding and

language arts tests, a large Health Maintenance Organization located in Roseville, California

was selected as the testing site for the new instruments. This test site offered a large pool

of subject-matter experts from which to draw, and it also provided subject-matter experts

who had some diversity in their particular area of medical expertise. Subject-matter experts

from several medical specialties were able to participate in the study.

Method

Participants

Twenty-three subject-matter experts took part in the beta testing of the medical

keyboarding and language arts tests (N

= 23). All subject-matter experts were either

medical assistants or medical secretaries (or of similar classification) and had at least one

year of experience working in that job occupation. The overall mean years of job experience

for the subject-matter experts was 7.43 (M

= 7.43, SD = 6.90). Subject-matter experts

spent approximately one hour taking and evaluating all three versions of both tests. Upon

completing the test evaluation, subject-matter experts were thanked for their participation

and compensated for their time with gift certificates from a local department store.

Materials

Medical Keyboarding and Language Arts Tests.

Final beta versions of the medical keyboarding and language arts tests were administered to

subject-matter experts. The tests were contained in a special beta version of OPAC 5.0 skills

testing software that had been installed onto a single computer located in the main office of

the hospital in which the test site was being held. Candidates were able to open the

program by selecting an icon located on the desktop of the computer. Once opened, the

computer automatically launched the tests, and candidates completed all three versions of

each test.

Validation Survey.

The validation survey was used to evaluate the quality and content validity of each test

being examined. The survey was constructed based on a validation report included in OPAC

5.0, and addresses the content validation requirements described in the Uniform Guidelines

(1978). Data on each topic was gathered in the survey:

• Whether or not the test measured the skill it was designed to measure

• Whether or not the skill being measured is required at job entry

• The importance of the skill

• The difficulty level of the test

• The subject-matter expert’s score on the test

• The subject-matter expert’s opinion as to what a minimally qualified candidate’s

score on the test should be to be considered for employment/promotion

The survey was also designed to capture subject-matter expert demographic information

such as name, gender, ethnicity, job title, and years of work experience. All versions of both

tests were examined separately, and subject-matter experts completed validation surveys

for all versions of each test.

Procedure

An office supervisor at the hospital was placed in charge of the test site. This test proctor

arranged individual appointments with each of the subject-matter experts to examine the

new medical tests at times that would not interfere with their regular work hours. These

individual appointments were staggered out over a two-week period, so as to allow

sufficient time for each subject-matter expert to be able to participate. Subject-matter

experts were seated at the computer which had the beta version of the OPAC software

installed. Once seated, subject-matter experts were given the validation survey, which

contained full instructions on how the testing process was to proceed. In order to keep track

of their scores on the computer, subject-matter experts entered their social security number

when prompted to do so by the computer. The computer then administered each version of

both tests to subject-matter experts, who had five minutes to complete each keyboarding

test, and 13 minutes to complete each language arts test. The order in which the tests were

presented was randomized so as to lessen any carry-over or practice effects. Between each

test, the computer was paused, allowing subject-matter experts to answer validation

questions about each of the tests in the survey.

After all six tests were completed, the subject-matter experts were asked to attest that they

gave each test their best effort, which they did by checking a box on the last page of the

survey that indicated as such. Subject-matter experts were thanked for their time and

escorted from the test site.

Results

In order to establish basic content validity for each test, at least 50% of subject-matter

experts must agree that proficiency in the skill which the test measures is essential for

successful performance of the job being selected for. Eighty-four percent of subject-matter

experts agreed that proficiency in language arts was essential to successful performance in

the job of medical assistant or secretary, and 89% agreed that keyboarding skills were

necessary for successful performance of the job of medical assistant or secretary.

It is also essential to demonstrate that a skill being tested for is required at the time of job

entry, and cannot be learned during a brief orientation. To that end, subject-matter experts

were asked whether or not keyboarding and language arts skills were required at time of

job entry of if they could be learned while on the job. Seventy-nine percent of subject-

matter experts agreed that language arts skills were essential at time of job entry, and 83%

percent agreed that keyboarding skills were essential at time of job entry.

Medical Keyboarding

Each alternate version of the medical keyboarding test was examined to determine mean

scores and difficulty levels for each. Mean scores and standard deviations for the medical

keyboarding test versions one, two, and three were highly comparable (M

= 35.78, SD =

11.46, M

= 35.09, SD = 13.52, M = 36.72, SD = 12.34), suggesting that the tests

contained similar content and had a similar level of difficulty. The overall mean standard

error of measurement was 4.12. In order to determine consistency between the different

versions of the test, an alternate form reliability analysis was conducted. The Pearson

product-moment correlation coefficient was used to determine the reliability of each version

of the test. From this analysis, the following matrix was developed.

Table 1: Product-moment correlations between each

version of the Medical Keyboarding Test.

Medical

Keyboarding

Version One

Medical

Keyboarding

Version Two

Medical

Keyboarding

Version Three

Medical

Keyboarding

Version One

1.00 .853* .830*

Medical

Keyboarding

Version Two

.853* 1.00 .968*

Medical

Keyboarding

Version Three

.830* .968* 1.00

*Significant at the 0.01 level.

Based upon the correlations between each version of the test, an overall mean correlation

was determined, R (22) = .88, p < .01. This is a strong reliability coefficient, and indicates

consistency between different versions of the test. Lastly, subject-matter experts were

asked to rate the difficulty level of the test. Using a simple Likert-type scale ranging from 1

to 3 (1 indicating that the test was too easy, 2 indicating that the test had the appropriate

level of difficulty, and 3, indicating that the test was too difficult) subject-matter experts

were asked to rate the overall difficulty of the test. Subject-matter experts rated the tests

with a mean difficulty level of M = 2.20, SD = .53, suggesting that the tests are set at an

appropriate, if slightly high, difficulty level.

Medical Language Arts

As with the medical keyboarding tests, each alternate version of the medical language arts

test was examined to determine mean scores and difficulty levels. Mean scores and

standard deviations for the medical language arts test versions one, two, and three were

consistent (M

= 56.59, SD = 13.21, M = 59.80, SD = 8.29, M = 57.67, SD = 11.37),

meaning that the tests contained similar content and had a similar level of difficulty.

Overall, the mean standard error of measurement was 5.36. As with the medical

keyboarding tests, a reliability analysis was conducted. The Pearson product-moment

correlation coefficient was again used to determine the reliability of each version of the test.

From this analysis, the following matrix was constructed.

Table 2: Product-moment correlations between each

version of the Medical Language Arts Test.

Medical Language

Arts Version One

Medical Language

Arts Version Two

Medical Language

Arts Version Three

Medical Language

Arts Version One

1.00 .743* .745*

Medical Language

Arts Version Two

.743* 1.00 .814*

Medical Language

Arts Version

Three

.745* .814* 1.00

*Significant at the 0.01 level.

An overall mean correlation was determined, R (22) = .77, p < .01. This is an acceptable

reliability coefficient, and indicates consistency between different versions of the test.

Subject-matter experts were lastly asked to rate the difficulty level of the language arts

test. Using a simple Likert-type scale ranging from 1 to 3 (1 indicating that the test was too

easy, 2 indicating that the test had the appropriate level of difficulty, and 3, indicating that

the test was too difficult) subject-matter experts were asked to rate the overall difficulty of

the test. Subject-matter experts rated the tests with a mean difficulty level of M = 1.99, SD

= 0.50, indicating that the tests are set at an appropriate level of difficulty.

Angoff Scores

To determine the appropriate cutoff score for each test, the modified Angoff method was

utilized. The United States Supreme Court (U.S. v. South Carolina) has upheld this method

of determining test cutoff scores (Biddle, 1993). Subject-matter experts were asked as to

what they believed the score on each test for a minimally qualified applicant should be,

which is designed to represent how a minimally qualified job applicant would perform on the

test. Subject-matter experts provided these Angoff scores for all versions of each test.

Angoff scores were then averaged across alternate versions of each test, yielding a mean

Angoff score of 33.92 for the medical keyboarding test, and 56.88 for the medical language

arts tests. Based on these Angoff scores, cutoff scores using each test’s standard error of

measurement could be derived. The cutoff score for each test was set at one standard error

of measurement unit below the test’s mean Angoff. This process led to the following

modified Angoff cutoff score for each test.

Table 3: Summary Statistics and Modified Angoff Cutoff

Scores for the Medical Keyboarding and Medical Language

Arts Tests.

Medical Keyboarding Medical Language Arts

Mean Angoff Score 33.92 56.88

Standard Deviation 12.09 11.12

R .88 .76

Mean Standard Error of

Measurement

4.12 5.36

Modified Angoff Cutoff Score 29 51