Frequently Asked Questions about How Online HSA Assessments Adapt to Each Student across Opportunities

Back FAQ

Frequently Asked Questions about How Online HSA Assessments Adapt to Each Student across Opportunities

If we are measuring ability to achieve specific performance expectations, why is there a difference in difficulty attached to the items that are given to some students and not others?

The adaptive online HSA Science (NGSS) Testing System selects items for each student that most accurately align with his or her performance on the test to that point. In general, students who are doing well on the test will see more difficult items, and students who are struggling will see easier items. Regardless of the difficulty of the items, all students are tested on the breadth of the content (elementary school, middle school, and high school life science respectively), and all students get an opportunity to demonstrate their higher-order thinking skills.

How is a student’s achievement measured from one test to another if there is also the factor of difference in difficulty of items given to students at different times?

Each item has a measured difficulty, so the items can be arranged along a scale. Student scores lie along that same scale. Imagine two students, one getting difficult items and the other receiving easier items. Suppose they both answer half of their items correctly. The student with the more difficult items will get a higher score. This is made possible through a statistical process known as equating, and it is used on virtually all adaptive tests.

Since the HSA Science (NGSS) Assessment may be given at two different times to students, how do we address concerns from students who may receive a lower score on their second test opportunity?

During the administration of each opportunity, students see items aligned to their performance on that assessment. The initial item selection is based on performance by the student on earlier assessment opportunities, but item difficulty quickly adjusts to current performance. Students receiving more difficult items get higher scores when they answer the same number of items correctly. Some student scores drop during the second opportunity because of distractions, a bad testing day, or other reasons. The adaptive nature of the test does not lead to this phenomenon, and in fact, generally reduces it. A student’s highest score across all opportunities is used for official reporting purposes, regardless of when that score was received during the testing window.

The standard error of measurement (SEM) also needs to be considered when reviewing a student’s scores for each opportunity administered for an HSA Science (NGSS) Assessment. The observed score on any test is an estimate of the true score. If a student took a similar test several times, the resulting scale score would vary across administrations, sometimes being a little higher, a little lower, or the same. The SEM represents the precision of the scale score, or the range in which the student would likely score if a similar test was administered several times. The “+/–” next to the student’s scale score provides information about the certainty, or confidence, of the score’s interpretation. The boundaries of the score band are one standard error of measurement above and below the student’s observed score, representing a range of score values that is likely to contain the true score. For example, 310 ± 10 indicates that if a student was tested again, two out of three times the student’s true score would likely fall between 300 and 320. Because students are administered different sets of items of varying item difficulties in each computer-adaptive content area assessment, the SEM can be different for the same scale score depending on how closely the administered items match the student’s ability.

Appropriate Uses

A student’s scale score should be evaluated after the SEM is added to or subtracted from the scale score. This provides a score range that includes the student’s true score with 68 percent certainty (i.e., across repeated administrations, the student’s test score would fall in this range about 68 percent of the time).

Inappropriate Uses

A small difference between scale scores (e.g., within one SEM) should not be interpreted as a significant difference. The measurement error should be taken into account when users are comparing scores. For example, students with scores of 301 and 303 are not reliably different because those scores differ by less than one SEM. The student’s true score can lie outside the score band. The score band contains a student’s true score with 68 percent certainty; therefore, the student’s true score can lie outside the score band.

Does the adaptive nature of the test mean that questions will go above or below a student’s grade level?

No. A requirement of the federal Every Student Succeeds Act (ESSA) is that the questions on the HSA Science (NGSS) Assessment must be based on the Next Generation Science Standards identified for the respective assessments.

Is there a purpose in giving students the test more than once if they pass the first time? It is taking valuable instructional time that could be used for collaborative projects, art, music, social studies, etc., but instead continues to be used for 'preparing to do better on the next test.'

The Hawai’i Department of Education requires schools to administer the online HSA Science (NGSS) Assessments to the students in the identified grades only once. Students may be offered a second opportunity if a school wishes to do so.

How does marking a question for review during the test affect the next question? Does it lower the level and value? Does it remain at the same difficulty level?

Marking an item for review does not in any way affect the selection of subsequent items. It is simply a way for a student to make a note to himself or herself to review the initial answer for an item. Only a student’s initial response to an item (independent of whether the item is marked for review), which is used to update the student’s ability estimate, will affect the selection of subsequent items. If a student changes the initial response to a marked or unmarked item, the change in response will result in an update of the student’s ability estimate, which will affect the selection of any additional items.