Skip to content
Jack Gerrits edited this page Aug 4, 2020 · 6 revisions

Learner

A learner, more commonly called a reduction (I will use the terms interchangeably), represents one stage in the reduction stack.

It is defined by a set of types, functions and a couple of fields.

Types

There are several explicit types, but also several implicit that must match otherwise there will be bugs.

A learner is defined explicitly (in template parameters) by:

  • T - The type of the data object of this reduction, refered to hereafter as DataT
  • E - The type of example this reduction expects. Either example or multi_ex. Refered to hereafter as ExampleT

A learner is defined implicitly by:

  • The base learner (or next learner) that this reduction references in its predict/learn/update/multipredict functions. I'll call this BaseT
  • The label type that this learner expects examples to have, let's call it LabelT
  • The prediction type that this learner produces, let's call this one PredictionT

Fields

  • prediction_type_t pred_type - An enum that corresponds to PredictionT
  • size_t weights - Describes how many weight vectors are required by this learner. This means that there can essentially be several models that are referenced by this learner.
  • size_t increment - Used along with the per call increment to reference different weight vectors
  • bool is_multiline - true if the expected ExampleT is multi_ex, otherwise false

Functions

For the overwhelming majority of reductions only learn, predict and finish_example are important.

Auto-recursion is where for a function each reduction in the stack is invoked in sequence automatically, without the called function knowing about it. Some functions may call the base (next) reduction in the stack explicitly.

Init

This is called once by the driver when it starts up. This does not auto-recurse, the topmost definition will be used.

void(DataT* data);

Learn/Predict/Update

These three functions are perhaps the most important. They defined the core learning process. update is not commonly used and by default it simply refers to learn.

These functions will not auto-recurse, however, in nearly all cases the you want the result of the next reduction and so they often do recurse. This is up to the reduction to implement.

void(DataT* data, BaseT* base_learner, ExampleT* example);

Multipredict

Multipredict does not need to be defined as the default implementation will use predict . It makes several predictions using one example, each call increments the offset and so it is effectively using a different weight vector. This is often used internally in reductions but not often used externally.

void(DataT* data, BaseT& base, ExampleT* ex, size_t count, size_t step, polyprediction* pred, bool finalize_predictions);
  • pred is an array of count number of polyprediction objects.
  • step is the weight increment to be applied per prediction

Sensitivity

Does not auto-recurse.

float(DataT* data, BaseT* base, ExampleT* example);

Finish Example

Finish example is called after learn/predict and is where the reduction must calculate and report loss as well as free any resources that were allocated for that example. Additionally, the example label and prediction must be returned to a clean slate.

void(vw&, DataT* data, EaxmpleT* ex);

End Pass

Called at the end of a learning pass. This function is autorecursive.

void(DataT* data);

End Examples

Called once all of the examples are finished being parsed and processed by the reduction stack. This function is autorecursive.

void(DataT* data);

Finish

Called as the reduction is being destroyed. However, do note that the destructor of DataT WILL be called, so often this function is not necessary. This function is autorecursive.

void(DataT* data);
Clone this wiki locally