annotation_spec
Annotation specs.
Note
You can find a higher level documentation about how library annotations in lineapy work, and how to contribute here.
Note
Developer Note:
-
All the classes in the
ValuePointer
follow this weird structure where their field entries duplicate the class name — this is so that when we load the YAMLs, they can differentiate the class based just by the field names. -
Also the string values for
AllPositionalArgs
,BoundSelfOfFunction
, andResult
are useless as well — just there so that we abide by the yaml structure. It's not very elegant and we can refactor this later.
AllPositionalArgs
Bases: BaseModel
References all positional arguments. E.g., in foo(a, b)
, a
and b
.
Expecting the string to be set the value "ALL_POSITIONAL_ARGUMENTS"---see Result for an explanation
Source code in lineapy/instrumentation/annotation_spec.py
58 59 60 61 62 63 64 65 |
|
Annotation
Bases: BaseModel
An annotation contains a single criteria for the function call,
and the corresponding side_effects
of the function call.
There are currently six types of criteria, all of which are explained in their respective classes:
There are currently three types of side_effects:
Source code in lineapy/instrumentation/annotation_spec.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
BaseModel
Bases: pydantic.BaseModel
Forbid extras on baseclass so typos will raise an error
Source code in lineapy/instrumentation/annotation_spec.py
31 32 33 34 35 36 37 |
|
BoundSelfOfFunction
Bases: BaseModel
References the bound self of a function. E.g., in foo.test(a, b)
,
foo
would be the bound self.
If the function is a bound method, this refers to the instance that was bound of the method.
We are expecting "SELF_REF"---see Result for an explanation.
Source code in lineapy/instrumentation/annotation_spec.py
68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
ClassMethodName
Bases: BaseModel
Specifies a class method name (as opposed to a function). An example is df.to_sql
:
- criteria:
class_method_name: to_sql
class_instance: DataFrame
Source code in lineapy/instrumentation/annotation_spec.py
234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
|
ClassMethodNames
Bases: BaseModel
A shorthand for a list of class method names (vs. a single one as in ClassMethodName).
- criteria:
class_method_names:
- to_csv
- to_parquet
class_instance: DataFrame
Source code in lineapy/instrumentation/annotation_spec.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
|
ExternalState
Bases: BaseModel
Represents some reference to some state outside of the Python program. The two types of external state supported are DB
and file_system
.
If we ever make a reference to an external state instance, we assume that it depends on any mutations of previous references.
Source code in lineapy/instrumentation/annotation_spec.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
__hash__()
Elsewhere we need ExternalState
to be hashable, it was pretty easy
with Dataclass (frozen option), but with Pydantic, we have to add an
extra hash function
[link][https://github.com/samuelcolvin/pydantic/issues/1303]
Source code in lineapy/instrumentation/annotation_spec.py
111 112 113 114 115 116 117 118 |
|
FunctionName
Bases: BaseModel
A single function name (vs. a list in FunctionNames).
Source code in lineapy/instrumentation/annotation_spec.py
226 227 228 229 230 231 |
|
FunctionNames
Bases: BaseModel
References a list of function names (vs. a single one in FunctionName).
One example is for the module boto3
(you can find all the annotations [here][https://github.com/LineaLabs/lineapy/blob/main/lineapy/annotations/external]):
- criteria:
function_names:
- upload_file
- upload_fileobj
Source code in lineapy/instrumentation/annotation_spec.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
|
ImplicitDependencyValue
Bases: BaseModel
References state that is implicitly depended on by the function. Currently it's used for external state like db + filesystem.
Source code in lineapy/instrumentation/annotation_spec.py
183 184 185 186 187 188 189 |
|
KeywordArgument
Bases: BaseModel
References a keyword argument. E.g., in foo(a=1, b=2)
, a
would have a
keyword argument of a
.
Source code in lineapy/instrumentation/annotation_spec.py
49 50 51 52 53 54 55 |
|
KeywordArgumentCriteria
Bases: BaseModel
Currently only used for the pandas in-place argument. We might need to augment how we parse it in the future for other inputs.
Source code in lineapy/instrumentation/annotation_spec.py
197 198 199 200 201 202 203 204 205 |
|
ModuleAnnotation
Bases: BaseModel
An annotation yaml file is composed of a list of ModuleAnnotation, which is to say that the annotations are hierarchically organized
by what module the annotation is associated with, such as pandas
and boto3
.
Source code in lineapy/instrumentation/annotation_spec.py
304 305 306 307 308 309 310 311 312 313 314 315 |
|
MutatedValue
Bases: BaseModel
A value that is mutated when the function is called. Consider the example
of the dump
function in joblib
. It mutates the file_system, which
is represented by ExternalState.
- module: joblib
annotations:
- criteria:
function_name: dump
side_effects:
- mutated_value:
external_state: file_system
Source code in lineapy/instrumentation/annotation_spec.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
PositionalArg
Bases: BaseModel
References a positional argument. E.g., in foo(a, b)
, a
would have a
positional argument of 0.
Source code in lineapy/instrumentation/annotation_spec.py
40 41 42 43 44 45 46 |
|
Result
Bases: BaseModel
References the result of a function. E.g., in bar = foo(a, b)
, bar
would
The result of a function call.
We are expecting "RESULT" for the field result
---though it's not needed
for the python class, it is needed for yaml, and setting a default value
makes the loader we use, pydantic, confused.
Source code in lineapy/instrumentation/annotation_spec.py
83 84 85 86 87 88 89 90 91 92 93 |
|
ViewOfValues
Bases: BaseModel
A set of values which all potentially refer to shared pointers So that if one is mutated, the rest might be as well. They are unique, like a set, but ordered for deterministic behavior, hence a list.
Take the fit
function in scikit-learn, if its assigned to a new variable,
then the variable is aliased to the original variable.
So we have the following annotation:
- base_module: sklearn.base
annotations:
- criteria:
base_class: BaseEstimator
class_method_name: fit
side_effects:
- mutated_value:
self_ref: SELF_REF # self is a keyword...
- views:
- self_ref: SELF_REF
- result: RESULT
Source code in lineapy/instrumentation/annotation_spec.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|