ILO Classification for B Readers

Photo by NIOSH

The International Labour Organization (ILO) is a tripartite U.N. agency that brings together governments, employers, and workers to set labor standards, develop policies and devise programs that promote honest work for women and men. The ILO International Classification of Radiographs of Pneumoconioses is a global tool used to:

Improve surveillance of worker health
Conduct epidemiological research
Make comparisons of statistical data

Classification Scheme

The Classification System includes the Guidelines and two sets of standard images. The standard images represent different types and severity of lung abnormalities and are used to compare a person’s images during the classification process. The system is oriented towards describing the nature and extent of features associated with the different pneumoconioses, such as:

Coal workers’ pneumoconiosis
Silicosis
Asbestosis

It deals with:

Parenchymal abnormalities (small and large opacities).
Pleural changes.

Other findings

In the ILO system, the reader is asked to:

Grade radiograph quality.
Categorize small opacities according to shape and size profusion.
- The size of small round opacities is characterized as:
  - p (up to 1.5 mm)
  - q (1.5-3 mm)
  - r (3-10 mm)
- Irregular small opacities are classified by their short axis width as s, t, or u (same sizes as for small, rounded opacities).
- Profusion (frequency) of small opacities is classified on a 4-point major category scale (0 – 3), with each major category divided into three, giving a 12-point scale between 0/- and 3/+.
- Large opacities are defined as any opacity greater than 1 cm that is present in the image. These are classified as:
  - Category A (for one or more large opacities whose combined dimension does not exceed 5 cm
  - Category B (for one or more large opacities whose combined dimension exceeds 5 cm but does not exceed the equivalent area of the right upper lung zone)
  - Category C (size is greater than the equivalent area of the right upper lung zone)
- Pleural abnormalities are also classified with respect to location, width, extent, and degree of calcification.
- Other abnormal features of chest radiographs can be commented upon (ILO 2022).

See the Guidelines for the Use of the ILO International Classification of Radiographs of Pneumoconioses for a full description and exact definition of terms and entities.

The Chest Radiograph Classification Form is used by the NIOSH Coal Workers’ Health Surveillance Program to record characteristics of the radiograph and abnormalities.

Classification Considerations

Accuracy and precision are important considerations when radiographic classifications made .

Accuracy is defined as the ability for a measurement to reflect the true degree of underlying abnormality.
Precision reflects the extent a measurement is consistent across repeated determinations.
A measurement technique can be precise but inaccurate or can be accurate and imprecise. It is preferable for a measurement to be both accurate and precise to optimize validity.

Inter- and intra-reader variability in chest radiography has existed since chest radiography was first used to identify and classify pneumoconiosis.¹

Inter-reader variability occurs when readers disagree amongst themselves on a classification. Inter-reader variation consists of two components:
- Systematic differences – this occurs when one reader consistently classifies images as having a higher profusion, or consistently has a lower profusion than another reader.
- Random variability – this occurs when the differences may be both higher or lower in a random fashion.
Intra-reader variability occurs when the same reader classifies a radiograph differently on different occasions.

Reader variability prompted the ILO to develop the ILO Classification scheme for pneumoconioses and has driven continued updates since then.² It was also a catalyst for development of the NIOSH B Reader Program.

Reader variability is characteristic in classifying radiographs for pneumoconioses. When excessive, reader variability is undesirable because it severely reduces the quality and usefulness of the classification data. Extreme differences can skew study results and can cause negative impacts. Examples of negative impacts include inappropriate denial of eligibility for compensation programs and award of compensation.

Disagreement among classifications from multiple readers in epidemiological or surveillance studies can be minimized using the methods described on this website. However, radiographic classification in contested settings often results in polarized opinions that are extremely difficult to reconcile.^3-4

The persistence of reader differences despite intensive measures to assess and correct it is demonstrated by findings for British coal miners. The British National Coal Board had a rigorous quality assurance process for minimization of both inter- and intra-reader variability. Despite these efforts, reader variability was not eliminated.^5-6 Given this, it may often be prudent to use multiple readers to obtain independent classifications and use an unbiased summary measure, such as the median classification, as the final determination. In this way, the final determination would reflect mainstream classification tendencies as much as possible.

Accuracy in radiographic classification is gained through careful and rigorous reader training. It is also obtained by applying specific conditions designed to eliminate bias during the classification process.

The following shows important measures that can be applied to ensure accuracy.

Note: The same degree of accuracy is not required in all settings where ILO classifications are obtained. Applying these procedures should help provide unbiased classifications. When ignored, suspect bias.

Selecting Readers

Procedures that give rise to unbiased classifications include:

Selection based on pre-existing evidence of mainstream classification tendencies
Random selection from a pool of available readers.

Selecting readers based on other criteria leaves the process open to accusations of bias. Proper procedures for selecting readers are not alone sufficient to ensure accuracy. Procedures should be accompanied by an appropriate quality assurance program.

Classification Blinding

Overall bias can occur when readers have information about radiographs being classified that can consciously or unconsciously influence their classifications. Knowing about worker exposures can bias readers to recording more or fewer abnormalities depending on the extent of an exposure. It can also cause preferential selection of certain types of abnormality depending on the nature of the exposure. For example, small rounded opacities for silica-exposed workers versus small irregular opacities for asbestos-exposed workers.

Blinding readers allows a classification to be made absent of preconceived knowledge and concepts. To minimize bias, remove or obscure identifying information before sending radiographs. Identifying information includes:

Age
Occupation
Work site information
Medical history

Withholding information on the source of the radiographs and who the classification has been requested for (e.g., the plaintiff or defendant in contested proceedings) will also help prevent bias.

Blinding is not appropriate for medical diagnosis and worker medical monitoring. However, if radiographs from worker monitoring programs are used for epidemiologic studies, re-read blinded to information that might influence a reader’s classification (e.g., industry, occupation, tenure etc.).

When assessing epidemiologic temporal trends in disease development or progression using sequential radiographs, knowing the order the radiographs were taken influences a reader’s classifications.⁷

An Environment that Does not Reward Extreme Determinations

Rewarding a reader for reporting disease leads to bias. Payment or compensation should not be linked with the outcome reported by the reader. Those seeking classifications should not knowingly select readers whose classifications are likely to be biased in a direction that suits their preference.⁴

Quality Assurance

There are various approaches to quality assurance, some better than others. Monitoring classification levels concurrently can be accomplished by adding quality assurance or calibration radiographs to a set of classifications without the reader being aware which are the calibration radiographs. For example, a National Institutes of Health-sponsored workshop suggested including chest images of unexposed workers in epidemiologic studies for purposes of quality control.⁸

Optimally, quality assurance radiographs should include a range of abnormality levels and types previously classified by expert readers. The benefits to this approach include:

The reader is under pressure to conform to standard classification practices. This is because the reader is unaware of which are the quality control radiographs but knows that they exist within the study.
The results for the quality assurance radiographs can be used to assess the accuracy of the reader’s classifications. Based on this assessment, it may be necessary to disregard or adjust the reader’s classifications.

Results of quality control classifications can also be used to provide feedback to readers to maintain and improve readers’ performance.⁹ This approach eliminates the defects in other quality control approaches. This is true for those that are undertaken independently and externally to the study and when a reader may consciously modify their behavior to appear more mainstream. Although using unknown calibration radiographs cannot eliminate all variation between readers, it should eliminate excesses.

Multiple Readers

Rigorous reader training is preferred to ensure accuracy before evaluating a reader. Use a pilot reading trial and quality control (calibration) readings with the candidate image readings. Keep in mind, despite careful training, evaluation, and feedback, systematic reader differences can continue.

Multiple readings coupled with appropriate summary measures (e.g., the median reading) can help minimize the impact of any one reader on the final determinations. Multiple readings also help improve precision of the data.

Using reader panels where groups of readers jointly classify radiographs and together come to a consensus or unanimous decision, is not usually recommended. Apart from the logistical difficulties of convening such panels, decisions that are made may fail to represent the true range of opinions in the group. Instead, joint classifications may reflect those of the most dominant or experienced reader or readers in the group.

Experience, careful training, and feedback to readers can help maximize precision of classifications among readers. Precision is also gained by obtaining multiple determinations and employing a summary index that reflects the central tendency (average) of those determinations.

Therefore, precision in image classification is achieved by using summary (e.g. median) scores derived from multiple independent classifications by different readers who classify the images independently (without other readers being present and without knowledge of other readers’ classifications). The number of independent classifications obtained depends on the setting and monetary costs involved.

Summary classifications developed from independent classifications are more precise than single individual classifications. However, care should be taken not to introduce bias when deriving summary classifications. Valid summarization methods include using median classifications or properly-designed consensus measures.

Inter-reader Comparisons

Passive Quality Control

In some settings, it may be beneficial to start preliminary classification activities where the same radiographs are classified independently by multiple readers and the findings reported back to the readers. This information may reveal a reader’s differences from the mainstream. It will also provide an opportunity for further education and self-correction.

Active Quality Control

Information from preliminary procedures is employed in the final selection of readers by removing extreme readers at each end of the scale. Similar quality assurance exercises can also be done during any classification process involving multiple readers and radiographs. This will provide continuing feedback and maintenance of standards. Active quality control provides a final check on reader consistency. However, such efforts provide only a form of relative quality assurance; the readers are compared only to each other and not to objective, external classifications. The only way to ensure true accuracy is to concurrently evaluate calibration radiographs as noted above.

Reader Selection

Readers should be both proficient and experienced in classifying chest radiographs for pneumoconioses. Readers should be:

Current B Readers
Highly experienced in classifying radiographs of dust-exposed workers
Representative of general classification practices among readers (i.e., not falling at either end of the extremes of the range of inter-reader variability).

One strategy to ensure that classifications fall within the mainstream is to select readers randomly from the largest pool of B Readers.

References

1 2 3 4 5 6 7 8 9

Fletcher CM, Oldham PD. The problem of consistent radiological diagnosis in coalminers’ pneumoconiosis. An experimental study. Br J Ind Med 1949; 6:168-183.

Bohlig H, Bristol LJ, Cartier PH, et al. UICC/Cincinnati classification of the radiographic appearances of pneumoconiosis. Chest 1970; 58:57-67.

Jacobsen M. Part 5. Radiologic Abnormalities: Epidemiologic Utilization: The International Labour Office Classification: Use and Misuse. Ann NY Acad Sci 1991; 643:100-107.

Friedman LS, De S, Almberg KS, Cohen RA. Association Between Financial Conflicts of Interest and ILO Classifications for Black Lung Disease. Ann Am Thorac Soc 2021; doi:10.1513/AnnalsATS.202010-1350OC.

Fay JWJ, Rae S. The Pneumoconiosis Field Research of the National Coal Board. Ann Occup Hyg 1959; 1:149-61.

Hurley JF, Burns J, Copland L, et al. Coalworkers’ simple pneumoconiosis and exposure to dust at 10 British coalmines. Br J Ind Med 1982; 39:120-7.

Reger RB, Amandus HE, Morgan WKC. On the diagnosis of coalworkers’ pneumoconiosis – Anglo-American disharmony. Am Rev Respir Dis 1973; 108:1186-91

Weill H, Jones R. The chest roentgenogram as an epidemiologic tool. Report of a workshop. Arch Environ Health 1975; 30:435-9.

Sheers G, Rossiter CE, Gilson JC, et al. UK naval dockyards asbestos study: radiological methods in the surveillance of workers exposed to asbestos. Br J Ind Med 1978; 35:195-203.