benchmarks

Utilities for computing statistics on benchmark data.

Translated from https://github.com/jupyterlab/jupyterlab/blob/82df0b635dae2c1a70a7c41fe7ee7af1c1caefb2/galata/src/benchmarkReporter.ts#L150-L244 which was originally added in https://github.com/jupyterlab/benchmarks/blob/f55db969bf4d988f9d627ba187e28823a50153ba/src/compare.ts#L136-L213

`Distribution` `dataclass`

Statistical description of a distribution

Source code in lineapy/utils/benchmarks.py

@dataclass
class Distribution:
    """
    Statistical description of a distribution
    """

    mean: float
    variance: float

    @classmethod
    def from_data(cls, data: List[float]) -> Distribution:
        return cls(mean(data), variance(data))

`DistributionChange` `dataclass`

Change between two distributions

Source code in lineapy/utils/benchmarks.py

@dataclass
class DistributionChange:
    """
    Change between two distributions
    """

    # Mean value
    mean: float
    # Spread around the mean value
    confidence_interval: float
    # The confidence interval level, i.e. 0.95 for a 95% confidence interval
    confidence_interval_level: float

    def __str__(self):
        """
        Format a performance changes like `between 20.1% slower and 30.3% faster (95% CI)`.
        """
        return (
            f"between {format_percent(self.mean + self.confidence_interval)} "
            f"and {format_percent(self.mean - self.confidence_interval)} "
            f"({self.confidence_interval_level * 100}% CI)"
        )

`str()`

Format a performance changes like between 20.1% slower and 30.3% faster (95% CI).

Source code in lineapy/utils/benchmarks.py

def __str__(self):
    """
    Format a performance changes like `between 20.1% slower and 30.3% faster (95% CI)`.
    """
    return (
        f"between {format_percent(self.mean + self.confidence_interval)} "
        f"and {format_percent(self.mean - self.confidence_interval)} "
        f"({self.confidence_interval_level * 100}% CI)"
    )

`distribution_change(old_measures, new_measures, confidence_interval=0.95)`

Compute the performance change based on a number of old and new measurements.

Based on the work by Tomas Kalibera and Richard Jones. See their paper "Quantifying Performance Changes with Effect Size Confidence Intervals", section 6.2, formula "Quantifying Performance Change".

Note: The measurements must have the same length. As fallback, you could use the minimum size of the two measurement sets.

Parameters:

Name	Type	Description	Default
`old_measures`	`List[float]`	The list of timings from the old system	required
`new_measures`	`List[float]`	The list of timings from the new system	required
`confidence_interval`	`float`	The confidence interval for the results. The default is a 95% confidence interval (95% of the time the true mean will be between the resulting mean +- the resulting CI)	`0.95`

Test against the example in the paper, from Table V, on pages 18-19

res = distribution_change(
    old_measures=[
        round(mean([9, 11, 5, 6]), 1),
        round(mean([16, 13, 12, 8]), 1),
        round(mean([15, 7, 10, 14]), 1),
    ],
    new_measures=[
        round(mean([10, 12, 6, 7]), 1),
        round(mean([9, 1, 11, 4]), 1),
        round(mean([8, 5, 3, 2]), 1),
    ],
    confidence_interval=0.95
)
from math import isclose
assert isclose(res.mean, 68.3 / 74.5, rel_tol=0.05)
assert isclose(res.confidence_interval, 60.2 / 74.5, rel_tol=0.05)

Source code in lineapy/utils/benchmarks.py

def distribution_change(
    old_measures: List[float],
    new_measures: List[float],
    confidence_interval: float = 0.95,
) -> DistributionChange:
    """
    Compute the performance change based on a number of old and new measurements.

    Based on the work by Tomas Kalibera and Richard Jones. See their paper
    "Quantifying Performance Changes with Effect Size Confidence Intervals", section 6.2,
    formula "Quantifying Performance Change".

    Note: The measurements must have the same length. As fallback, you could use the minimum
    size of the two measurement sets.

    Parameters
    ----------
    old_measures: List[float]
        The list of timings from the old system
    new_measures: List[float]
        The list of timings from the new system
    confidence_interval: float
        The confidence interval for the results.
        The default is a 95% confidence interval (95% of the time the true mean will be
        between the resulting mean +- the resulting CI)

    Test against the example in the paper, from Table V, on pages 18-19

    ```python
    res = distribution_change(
        old_measures=[
            round(mean([9, 11, 5, 6]), 1),
            round(mean([16, 13, 12, 8]), 1),
            round(mean([15, 7, 10, 14]), 1),
        ],
        new_measures=[
            round(mean([10, 12, 6, 7]), 1),
            round(mean([9, 1, 11, 4]), 1),
            round(mean([8, 5, 3, 2]), 1),
        ],
        confidence_interval=0.95
    )
    from math import isclose
    assert isclose(res.mean, 68.3 / 74.5, rel_tol=0.05)
    assert isclose(res.confidence_interval, 60.2 / 74.5, rel_tol=0.05)
    ```
    """
    n = len(old_measures)
    if n != len(new_measures):
        raise ValueError("Data have different length")
    return performance_change(
        Distribution.from_data(old_measures),
        Distribution.from_data(new_measures),
        n,
        confidence_interval,
    )

benchmarks

`Distribution` `dataclass`

`DistributionChange` `dataclass`

`str()`

`distribution_change(old_measures, new_measures, confidence_interval=0.95)`

Was this helpful?

Help us improve docs with your feedback!

benchmarks

Distribution dataclass

DistributionChange dataclass

__str__()

distribution_change(old_measures, new_measures, confidence_interval=0.95)

Was this helpful?

Help us improve docs with your feedback!

`Distribution` `dataclass`

`DistributionChange` `dataclass`

`str()`

`distribution_change(old_measures, new_measures, confidence_interval=0.95)`