Another method that could be seen in the literature was to first conduct a statistical test to determine whether the grouping was important and, if not, to conduct an analysis without adapting to the grouping [37]. This method is not recommended, because even if the clustering test is not statistically significant, the aggregation in the data may still be sufficient to distort the match index. Figure 2 shows the correlation diagram for systolic blood pressure measurements, and we can see that in individuals, differences between the two measurement methods are important at both extremes. However, the average difference between all individuals is actually small with 1.32, and a coupled t-test gives a p value of 0.70, suggesting that there is no difference between the two, although this graph indicates that the two measurement methods do not match. In addition, if we compare two measurement methods, it is unlikely that the different methods correspond exactly by giving identical results for all people. We generally want to know how different the methods are, and if that is not enough to create problems, we can replace the old method with the new one. With sufficiently large sample sizes, even small differences between the measurements would lead to small values p. The correlation coefficient (CCC) method was developed by Lin in 1989 [3], with the longitudinal, repeated measurement version of the CCC developed by King et al. [4], Carrasco et al. [17] and Carrasco and Jover [18]. The CCC is a standardized coefficient that takes values from 1 to 1, 1 indicating perfect match and 1 the perfect match.

For the CCC model, individual measurements are modeled with a combination of random and fixed effects. The terms of interaction are often included. With respect to our example of COPD in particular, we start from the following linear model of mixed effects. This is determined by assessing the insparation of their limits of variation in the range of clinically acceptable differences. The probability of coverage (CP) proposed by Lin et al. [6] answers the same question more directly by calculating the probability that the differences between the devices themselves are at the limit of a tolerance interval – what Bland and Altman call the domain of clinically acceptable differences. Higher probabilities clearly indicate closer convergence. In practice, the researcher must decide whether the value of the CP is large enough for the two devices to be interchangeable.

Bland-Altman-Plot which shows the difference between devices with the average pair of devices.