Fast! How to Calculate IOA + Examples

Interobserver agreement (IOA) quantifies the extent to which independent observers’ data match. Computation of this metric involves comparing the recordings of two or more observers who have independently observed and recorded the same event or behavior. For example, if two observers are tracking the frequency of a specific student behavior in a classroom, a calculation of this type provides a numerical index of their consistency in identifying and recording those behaviors.

Establishing acceptable levels of agreement is crucial for research validity and the reliability of data collected in applied settings. High levels of agreement strengthen confidence that the data accurately reflect the phenomenon being observed, minimizing observer bias and measurement error. The use of this type of measurement has a long history in observational research, particularly in fields like psychology, education, and behavioral analysis, where direct observation is a primary method of data collection. Its adoption contributes to the scientific rigor of the research process.

Several different formulas and methods are available to determine the level of agreement between observers. Selection of the appropriate method is contingent upon the nature of the data being collected and the specific research question being addressed. Common methods, calculation steps, and considerations for various data types will be discussed in the following sections.

1. Formula Selection

The procedure to assess agreement hinges significantly on the formula employed. The choice of formula is not arbitrary; it is dictated by the type of data collected and the nature of the observational study. Selecting an inappropriate formula yields a misleading representation of the true level of agreement between observers, thereby jeopardizing the validity of the findings. For instance, when observers record the duration of a behavior, such as the time a student spends engaged in a task, a formula suitable for continuous data, like the Pearson correlation coefficient or Intraclass Correlation Coefficient (ICC), is required. Conversely, when observers record whether a behavior occurred within specific intervals, percentage agreement or Cohen’s Kappa, which are suitable for categorical data, become applicable.

A common pitfall arises when researchers mistakenly apply percentage agreement to interval data without accounting for agreement by chance. This overestimates the true agreement level, potentially leading to erroneous conclusions about the reliability of the observational data. Cohen’s Kappa provides a correction for chance agreement, providing a more conservative and accurate estimate. Similarly, when dealing with event-based data, such as the frequency of specific actions, formulas such as total count IOA or exact agreement IOA are used, each providing distinct information about the degree of correspondence between observers. Understanding that formula choice affects the interpretation of agreement is critical for responsible research practice.

In summary, the appropriate formula is a foundational element. Its correct selection ensures an accurate depiction of the consistency among observers’ records. Failure to align the formula with the data type introduces a systematic bias, thereby undermining the conclusions drawn from the observational study. Selection should be a deliberate process, guided by the characteristics of the data and the specific aims of the investigation, reflecting that this is a critical aspect of calculating agreement accurately.

2. Data Type

The procedure to assess agreement is fundamentally linked to the nature of the data collected. Data type dictates the appropriate method for quantifying the degree to which independent observers’ records correspond. Discrepancies in data type necessitate employing different agreement metrics. For instance, when quantifying the duration of a behavior, which yields continuous data, correlation-based measures are relevant. Conversely, when recording whether a behavior occurs within predefined intervals, generating categorical data, percentage agreement or Kappa statistics are more appropriate. Failure to align the calculation with the specific data type compromises the validity of the resulting agreement coefficient. This is due to the fact that different formulas are designed to capture distinct aspects of agreement, and applying a mismatching formula yields a misleading representation of observer consistency.

Consider the practical implications of this connection. Suppose a researcher aims to evaluate the reliability of two observers recording instances of disruptive behavior in a classroom. If observers record the frequency of specific behaviors (e.g., number of times a student calls out), the data is event-based, and metrics like total count agreement are suitable. However, if observers instead record whether disruptive behavior occurred within consecutive 10-second intervals, the data becomes interval-based, requiring the use of percentage agreement or Cohen’s Kappa. Incorrectly applying total count agreement to interval data, or vice versa, would generate an inaccurate representation of the observers’ level of consistency, thereby affecting the conclusions drawn about the reliability of the observation protocol.

In summation, an appreciation of data types is paramount for appropriately assessing agreement. Selecting the appropriate formula hinges directly on whether the data is continuous, interval, event-based, or time series. Choosing a mismatching approach generates misleading indices of observer reliability and undermines the internal validity of research findings. An effective calculation of interobserver agreement necessitates a careful matching of the formula to the fundamental nature of the collected data, ensuring that the resulting metric accurately reflects the degree of consistency between observers.

3. Agreement Definition

The method to quantify the degree to which independent observers’ records correspond hinges on a clear articulation of “agreement.” The definition of “agreement” is not universal; it is contingent on the specific research question, observational protocol, and data characteristics. This definition directly influences the selection of the appropriate formula and, subsequently, the calculation of interobserver agreement (IOA).

Exact Agreement

Exact agreement refers to situations where observers’ records are identical. For example, if two observers independently record the number of times a student raises their hand in a 15-minute period, and both record “5,” this represents exact agreement. However, if one observer records “5” and the other “6,” there is no exact agreement. Calculating IOA based on this strict definition will yield a lower agreement coefficient compared to more lenient definitions, particularly when observing complex or nuanced behaviors. This approach prioritizes precision and minimizes potential for error.
Proximity-Based Agreement

Proximity-based agreement acknowledges a degree of acceptable variance between observers’ records. This is particularly relevant when dealing with continuous data, such as duration measures. For instance, if observers record the length of time a student is engaged in a task, agreement may be defined as falling within a specified range. If one observer records 60 seconds and another records 62 seconds, and the predetermined agreement range is +/- 5 seconds, this would be considered agreement. Proximity-based definitions require a rationale for the chosen threshold, considering the measurement precision and acceptable level of measurement error.
Event-Based Agreement

In event-based observations, agreement can be defined based on the occurrence or non-occurrence of specific events within a specified timeframe. For instance, observers may record whether a particular behavior occurred within a 10-second interval. Agreement is recorded if both observers indicate the behavior occurred, or if both observers indicate it did not occur. This approach is common in interval recording methods. The challenge lies in ensuring observers have clearly defined events and consistent criteria for judging their occurrence.
Qualitative Agreement

For qualitative data, agreement can be defined based on the categorization of observations into predefined codes or themes. Observers independently code segments of text or video, and agreement is assessed based on the consistency of their coding. The degree of agreement may be measured using Cohen’s Kappa, which accounts for the possibility of agreement occurring by chance. The clarity and explicitness of the coding scheme is paramount for achieving high levels of qualitative agreement.

These various definitions of agreement underscore the importance of clearly specifying what constitutes “agreement” in a given study. The calculation of IOA is fundamentally dependent on this definition. A lack of clarity in the definition introduces ambiguity and undermines the validity of the IOA coefficient. Therefore, researchers must explicitly define agreement in their observational protocols and select the calculation method accordingly to ensure the reliability and interpretability of their findings. The definition should be aligned to both the specific context of the study and to the nature of the data being collected.

4. Observer Independence

Observer independence is a prerequisite for valid calculations of interobserver agreement (IOA). This condition ensures that observed agreements accurately reflect the extent to which observers independently perceive and record the same phenomena, rather than resulting from mutual influence or knowledge of each other’s observations.

Procedural Safeguards

Maintaining observer independence often necessitates implementing procedural safeguards. Observers must be physically separated during data collection to prevent visual or auditory cues from influencing their judgments. Training protocols should emphasize the importance of independent observation and discourage discussion of observations until after data collection is complete. For instance, in classroom observations, observers would be stationed at different locations within the room and explicitly instructed not to confer with each other during the observation period. Failure to implement such safeguards introduces the potential for artificial inflation of the IOA coefficient.
Blind Observation

Ideally, observers should be blind to the hypotheses of the study and any experimental manipulations that might influence their perceptions. Knowledge of the expected outcomes can bias observations, leading observers to unconsciously record data that supports the hypotheses. Blind observation minimizes this bias, ensuring that recorded agreements are based on objective perceptions. For example, in a study evaluating the effectiveness of a behavioral intervention, observers would be unaware of which participants received the intervention. This blinding procedure minimizes the risk that observers unintentionally record more positive behavior changes for the intervention group.
Data Handling Protocols

Protocols for handling collected data are critical in preserving observer independence. Data should be recorded using standardized forms or electronic systems that prevent observers from accessing or modifying each other’s entries. After the data collection phase, observers should not have access to the data until the IOA calculation is completed. This prevents the possibility of observers retrospectively adjusting their recordings to improve agreement with other observers. Establishing secure data management practices ensures that IOA calculations accurately reflect the initial, independent observations.
Implications for Interpretation

Violations of observer independence compromise the interpretability of interobserver agreement. Artificially inflated IOA coefficients provide a false sense of confidence in the reliability of the data. If observers have influenced each other’s observations, the calculated agreement does not reflect the true level of objectivity in the data collection process. This can lead to inaccurate conclusions about the phenomenon under study. In cases where observer independence cannot be fully guaranteed, researchers must acknowledge this limitation and interpret the IOA coefficient with caution.

In summary, ensuring observer independence is fundamental to the validity of IOA calculations. Implementing procedural safeguards, adopting blind observation techniques, and establishing robust data handling protocols are essential steps in minimizing observer bias and maximizing the accuracy of agreement assessments. When observer independence is maintained, the resulting IOA coefficient provides a meaningful index of the reliability and objectivity of the observational data, bolstering the integrity of the research findings.

5. Calculation Unit

The determination of the specific calculation unit represents a critical decision point in the quantification of agreement. The chosen unit of analysis directly affects the sensitivity and interpretability of interobserver agreement metrics. The unit defines the scope within which agreement is assessed, influencing the observed level of consistency between observers. Therefore, careful consideration of this factor is essential for accurate quantification of the reliability of observational data.

Time Intervals

When employing time intervals as the calculation unit, agreement is assessed within discrete segments of time. This approach is commonly used in interval recording methods, where observers record whether a target behavior occurred within predefined intervals (e.g., every 10 seconds). The level of agreement is then calculated as the proportion of intervals in which observers recorded the same occurrence or non-occurrence of the behavior. The selection of the interval duration is crucial. Shorter intervals increase sensitivity to brief behavioral events but also increase the likelihood of chance disagreements. Longer intervals reduce sensitivity but may be more appropriate for observing behaviors that occur over extended periods. For instance, in observing classroom engagement, a 10-second interval may be suitable for capturing brief instances of off-task behavior, while a 5-minute interval might be more appropriate for assessing sustained engagement in academic tasks.
Events or Instances

Defining the calculation unit as discrete events or instances involves assessing agreement on a per-occurrence basis. This is particularly relevant when observing behaviors that have a clear beginning and end. For example, observers might record the number of times a student raises their hand or initiates a conversation. Agreement is then calculated based on the correspondence of the total counts or the exact matching of individual event recordings. This approach requires precise operational definitions of the target behavior to ensure that observers consistently identify and record the same events. For example, a clear definition of “aggression” is needed before observers can count the aggressive incidents in a playground. Otherwise, observers’ interpretation differences may invalidate any measure of agreement.
Participants or Subjects

In some research designs, the calculation unit is defined as individual participants or subjects. This approach is used when observers are rating or classifying participants based on their overall behavior or characteristics. Agreement is then calculated based on the consistency of the ratings or classifications assigned to each participant. For example, observers might rate students’ levels of anxiety or classify them into diagnostic categories. This type of agreement assessment requires clear and well-defined rating scales or classification systems to minimize subjectivity and ensure that observers are applying the same criteria. In the context of diagnostic classification, discrepancies in diagnoses can have serious consequences, underscoring the need for high levels of interobserver agreement.
Sessions or Trials

When observations are conducted across multiple sessions or trials, the calculation unit can be defined as individual sessions or trials. This approach is used to assess the consistency of observers’ recordings across repeated observations of the same behavior or phenomenon. Agreement is then calculated based on the correspondence of the data collected within each session or trial. For example, observers might record the number of errors a participant makes on a series of learning trials. By assessing agreement on a per-trial basis, researchers can evaluate the reliability of the observational data over time and identify any systematic biases or inconsistencies in observers’ recordings. This is particularly important in longitudinal studies where observational data is collected over extended periods, as it allows researchers to monitor the stability of observer agreement and address any issues that may arise.

In conclusion, the determination of the calculation unit is a critical component of interobserver agreement assessment. Whether defined as time intervals, events, participants, or sessions, the selected unit directly influences the sensitivity and interpretability of agreement metrics. Careful consideration of the nature of the observed behavior, the research design, and the specific research question is essential for selecting the most appropriate calculation unit and ensuring the accuracy and validity of the interobserver agreement assessment. The goal is to choose a unit of analysis that maximizes the meaningfulness of the agreement coefficient, providing a clear and informative index of the reliability of the observational data.

6. Total Agreements

The quantity of instances where observers’ records align is a primary component in the methods used to compute interobserver agreement. This count serves as a numerator in many formulas designed to quantify the reliability of observational data. Without accurately determining this number, a meaningful assessment of agreement is not possible.

Direct Proportion to Agreement Coefficient

The calculated agreement coefficient is directly proportional to the total agreements identified. An increase in the number of instances where observers record the same observation results in a higher agreement coefficient, suggesting greater reliability. Conversely, a lower count of agreements yields a reduced coefficient, indicating less consistency. For example, if two observers independently record the occurrence of a behavior in 8 out of 10 intervals, this yields higher agreement than if they only agreed in 5 out of 10 intervals. The direct relationship underscores the importance of meticulous data collection and coding practices to maximize the number of consistent observations.
Influence on Statistical Power

The number of agreements indirectly influences the statistical power of studies using observational data. Higher agreement rates contribute to reduced measurement error, increasing the likelihood of detecting statistically significant relationships between variables. Conversely, low agreement rates inflate measurement error, potentially masking true relationships. Studies relying on observational data with low rates of agreement often require larger sample sizes to achieve adequate statistical power. Therefore, maximizing agreements enhances the efficiency and interpretability of research findings. This concept becomes relevant when measuring sensitive topics where getting the exact data and agreeing is much more important.
Relationship to Different Formula Types

The role of total agreements varies depending on the specific formula used to calculate agreement. In simple percentage agreement formulas, total agreements are directly divided by the total number of observations. Formulas that account for chance agreement, such as Cohen’s Kappa, consider both agreements and disagreements, adjusting the agreement coefficient based on the expected level of agreement that would occur by chance. The presence of more possible agreements can alter the kappa value which means in some cases the exact calculation of this parameter is very important.

In summary, the total number of instances where observers’ records align is a fundamental element in determining agreement. Its influence permeates through various formulas and directly affects the interpretation of research findings. Accurately quantifying agreements is not merely a procedural step, but a critical element in ensuring the reliability and validity of observational data.

7. Total Disagreements

The count of instances where independent observers’ records diverge constitutes a crucial element in calculating interobserver agreement (IOA). These disagreements, when considered alongside total agreements, provide a comprehensive understanding of the consistency and reliability of observational data.

Inverse Relationship with Agreement Coefficients

The magnitude of an agreement coefficient is inversely proportional to the number of disagreements. An increase in the number of disagreements inevitably leads to a reduction in the agreement coefficient, indicating a lower degree of reliability. For instance, if two observers independently code a series of behavioral events, and the number of disagreements increases due to ambiguous coding definitions, the calculated IOA will diminish. This inverse relationship underscores the importance of minimizing disagreements through rigorous training and well-defined observational protocols.
Influence on Specific IOA Formulas

Various IOA formulas incorporate disagreements in distinct ways. Simple percentage agreement formulas often focus primarily on agreements, but more sophisticated measures, such as Cohen’s Kappa, explicitly account for disagreements. Cohen’s Kappa penalizes disagreements, adjusting the agreement coefficient to reflect the degree to which the observed agreement exceeds what would be expected by chance. Therefore, the impact of disagreements on the final IOA score varies depending on the specific formula employed.
Diagnostic Value of Disagreements

Analyzing the nature and sources of disagreements provides valuable insights into the observational process. Identifying patterns in disagreements can reveal ambiguities in coding definitions, inconsistencies in observer application of the coding scheme, or difficulties in observing specific behaviors. For example, if observers consistently disagree on the classification of certain behaviors, this suggests that the operational definition of those behaviors needs to be refined. Disagreements serve as diagnostic indicators, guiding improvements in observational procedures and enhancing the reliability of subsequent data collection efforts.
Impact on Statistical Power and Validity

A high number of disagreements compromises the statistical power and validity of research findings. Increased measurement error, resulting from observer inconsistencies, reduces the likelihood of detecting true effects or relationships between variables. Furthermore, a substantial number of disagreements raises concerns about the accuracy and credibility of the observational data, potentially undermining the validity of the conclusions drawn from the study. Consequently, minimizing disagreements is essential for ensuring that research findings are both statistically sound and conceptually meaningful.

The accurate quantification and analysis of total disagreements are indispensable for calculating IOA and interpreting observational data. Disagreements not only affect the magnitude of agreement coefficients but also provide valuable diagnostic information for improving observational protocols and enhancing the reliability and validity of research findings. A comprehensive understanding of disagreements, in conjunction with agreements, allows researchers to assess the quality of their observational data and draw more robust conclusions.

8. Interpretation Thresholds

Established benchmarks against which calculated interobserver agreement (IOA) coefficients are evaluated are critical to the interpretation of observational data. These benchmarks, or thresholds, provide a frame of reference for determining whether the obtained level of agreement is adequate for supporting the reliability and validity of research findings.

Acceptable Agreement Levels

Predetermined agreement levels function as minimum standards for data acceptability. Widely cited guidelines suggest that IOA coefficients of 0.80 or above indicate acceptable agreement, signifying that the data are sufficiently reliable for research purposes. However, the specific threshold may vary depending on the nature of the study, the complexity of the observational coding scheme, and the consequences of measurement error. For instance, in clinical settings where diagnostic decisions are based on observational data, more stringent thresholds (e.g., 0.90 or above) may be necessary to ensure accuracy and minimize the risk of misclassification. These thresholds are defined, in most cases, based on commonly adopted research guidelines.
Context-Specific Considerations

The interpretation of IOA coefficients should consider the specific context of the research. Factors such as the training and experience of observers, the clarity and complexity of the coding system, and the prevalence of the target behavior can influence the observed level of agreement. In studies involving complex observational coding schemes or novice observers, lower IOA coefficients may be deemed acceptable if efforts have been made to minimize observer bias and measurement error. Conversely, in studies with experienced observers and well-defined coding systems, higher IOA coefficients may be expected. The degree to which observational data will directly inform important decisions is important to consider. The interpretation of agreement should be relative to the study circumstances.
Implications for Data Interpretation

The interpretation of IOA coefficients directly impacts the conclusions drawn from observational data. If the calculated IOA falls below the established threshold, it raises concerns about the reliability and validity of the data. Researchers may need to re-evaluate the coding system, provide additional training to observers, or collect additional data to improve agreement. In some cases, data with unacceptably low IOA may need to be excluded from analysis. Conversely, if the IOA exceeds the threshold, it provides support for the reliability of the data, increasing confidence in the validity of the research findings. Reaching appropriate agreement levels is essential before making strong claims based on the data.
Statistical vs. Practical Significance

While statistical significance is a factor in interpreting research data, emphasis should be given to its practical significance, considering IOA scores. An IOA score above 0.80 indicates good interobserver reliability. Scores between 0.60 and 0.80 suggest moderate reliability, warranting careful data interpretation. Scores below 0.60 indicate poor reliability, potentially invalidating the data. This distinction ensures a nuanced understanding of research findings, acknowledging that statistical significance alone does not guarantee practical applicability or meaningful results. Emphasizing practical significance encourages researchers to consider the real-world implications of their findings and make informed decisions based on the quality of their data.

Consideration of interpretation thresholds is an essential component of appropriately assessing the reliability and validity of research. These thresholds help to determine if agreement between independent observers is in alignment, facilitating informed decisions about observational data. Researchers must consider these thresholds within the context of the specific study and research question, and ensure that their study design and procedures are well described. It is important to use and interpret interobserver agreement, while being mindful of the strengths and limitations of the selected approach.

Frequently Asked Questions

The following addresses common inquiries regarding the methods to assess the level of consistency between independent observers.

Question 1: What is the minimum acceptable percentage agreement score for research data?

While a definitive threshold is absent, a percentage agreement score of 80% or higher is generally considered acceptable for research purposes. However, the suitability of this threshold depends on the specific context of the study and the nature of the observed behavior.

Question 2: How does the complexity of the observational coding system affect the interpretation of coefficients?

More complex coding systems with numerous categories often result in lower coefficients due to the increased potential for observer error. In such cases, slightly lower coefficients may still be deemed acceptable if observers receive thorough training and the coding system is well-defined.

Question 3: What steps can be taken to improve agreement among observers?

To enhance consistency, researchers should provide comprehensive training to observers, clearly define the observational coding system, conduct regular reliability checks, and address any ambiguities or inconsistencies that arise during the data collection process.

Question 4: Is it possible to have high agreement by chance?

Yes, high agreement can occur by chance, particularly when observing behaviors with a high frequency or when using a limited number of coding categories. Formulas such as Cohen’s Kappa account for chance agreement, providing a more accurate estimate of true observer reliability.

Question 5: What are the consequences of low interobserver agreement?

Low consistency raises concerns about the reliability and validity of the data. It can lead to inaccurate conclusions, reduce the statistical power of the study, and compromise the credibility of the research findings. Remedial actions are necessary to improve agreement before proceeding with data analysis.

Question 6: How does observer drift impact the accuracy of calculated agreement?

Observer drift, the tendency for observers to deviate from the established coding criteria over time, can lead to a decline in interobserver agreement. Regular refresher training and periodic reliability checks are essential to mitigate the effects of observer drift and maintain data quality.

Understanding these key points ensures that the metrics accurately reflect the consistency between independent observers. This understanding contributes to the validity of conclusions drawn from collected data.

Now that frequently asked questions have been clarified, let’s proceed to a practical demonstration.

Guidance for Computing Interobserver Agreement

The following directives aim to enhance the precision and rigor of assessments pertaining to the degree of consistency between independent observers. Adherence to these principles facilitates the generation of reliable and valid data.

Tip 1: Employ Precise Operational Definitions: Ambiguous or vaguely defined coding criteria introduce variability in observer interpretation. The utilization of clear, detailed, and measurable operational definitions for all target behaviors is essential. For example, instead of using a general term like “disruptive behavior,” define it as “any instance of calling out without raising a hand, leaving one’s seat without permission, or engaging in physical aggression.”

Tip 2: Ensure Comprehensive Observer Training: Prior to data collection, observers must undergo rigorous training to ensure a thorough understanding of the coding system and observational procedures. Training should include didactic instruction, practice observations, and feedback sessions to address any inconsistencies or uncertainties. The attainment of a pre-determined level of agreement during training is a prerequisite for participation in data collection.

Tip 3: Maintain Observer Independence: To prevent observer bias, all observations must be conducted independently. Observers should be physically separated during data collection and prohibited from discussing their observations until after the data collection phase is complete. The implementation of blind observation procedures, where observers are unaware of the study’s hypotheses or experimental conditions, further minimizes the potential for bias.

Tip 4: Select the Appropriate Formula: The choice of formula is contingent upon the type of data being collected and the research question being addressed. Simple percentage agreement is suitable for nominal data, while Cohen’s Kappa is appropriate for accounting for chance agreement. For continuous data, consider using intraclass correlation coefficients (ICCs) or Pearson correlations. The selection of an inappropriate formula undermines the accuracy and interpretability of the IOA assessment.

Tip 5: Establish Pre-Determined Agreement Thresholds: Prior to data collection, establish clear agreement thresholds that define the minimum acceptable level of interobserver reliability. These thresholds should be based on established guidelines and the specific requirements of the research study. Data that fails to meet the pre-determined threshold should be subjected to further scrutiny or excluded from analysis.

Tip 6: Conduct Regular Reliability Checks: Throughout the data collection period, conduct regular reliability checks to monitor observer consistency and identify any instances of observer drift. These checks should involve having observers independently code a subset of the data and calculating IOA to ensure that agreement remains within acceptable limits. Implement corrective actions, such as refresher training or revisions to the coding system, if agreement falls below the threshold.

Tip 7: Document All Procedures: Comprehensive documentation of all observational procedures, including observer training protocols, coding definitions, data collection methods, and IOA calculations, is essential for ensuring transparency and replicability. This documentation should be readily available for review by other researchers and should include details about any deviations from the planned procedures.

Adhering to these recommendations contributes to enhancing the rigor and validity of research findings. Prioritizing the meticulous computation bolsters the integrity of the research.

With these key elements outlined, it is fitting to conclude this discourse.

Conclusion

The accurate determination of interobserver agreement, a metric quantifying the correspondence between independent observers’ records, is paramount in observational research. The preceding discussion has highlighted the essential elements for assessing this agreement, including formula selection, data type consideration, agreement definition, maintenance of observer independence, and the appropriate selection of the calculation unit. Meticulous attention to these aspects ensures that the calculated index provides a valid and reliable indication of the consistency of observational data.

The implementation of robust procedures for interobserver agreement calculation serves as a cornerstone for establishing the credibility of research findings across various disciplines. Continued emphasis on refining observational methodologies and promoting rigorous application of IOA techniques will enhance the rigor and trustworthiness of scientific inquiry.