Keraflow
Deep Learning for Python.
|
A Layer is a one-to-one or many-to-one tensor transformer
Both theano and tensorflow are graph based deep learning framework. Let X
, W
and b
be three symbolic tensors. Then Y=WX+b
makes Y
a new symbolic tensor that equals to W
dots X
plus b
. When there are many tensors in your model, things get messy. The purpose of Layer is to simplify the process of building new tensors from existing tensors. For example:
In Keraflow, an Input layer (line 1) is a special type of layer that could be treated as a symbolic tensor. By feeding a tensor X
to a layer (line 4), we got a new symbolic tensor Y
, which can then be fed to another layer.
Each layer takes different arguments to initialize. As seen above, a dense layer takes three argument. However, there are some common key word arguments you could pass to every layer, which are defined in Layers' __init__:
name
for easy debugging.trainable
to False, if you don't want the layer parameters be updated during training process.initial_weights
to directly assign the initial values of the layer parameters (this will override the init
argument). See Initializations.regularizers
to apply regularization penalty on the layer parameters. See Regularizers.constraints
to restrict the value of layer parameters when updating them during training process. See Constraints.For details on setting initial_weights
, regularizers
, and constraints
, please see Argument Passing Summarization and the related pages.
A layer can be fed multiple times by different input tensors. Considering the following case:
By feeding the same layer two times, we keep only one W
and b
in our tensor graph.
A layer might take more than one tensor as input. For example:
The only difference of a multiple input layer is that it takes a list of input tensors as input instead a single tensor. Users could define their own layers to take either single or multiple input tensors. See Implementing Layer Functions
We already see how to obtain a new tensor by feeding a tensor to a layer. However, it would be cumbersome to name all the tensors if we just want to perform a series of operation. Sequential thus provides a syntax sugar for us:
Note that when the first layer of Sequential
is an Input
layer, it is treated as a tensor (the tensor output by the last layer), i.e. you could feed it to another layer but you can not feed other tensor to it.
When the first layer of Sequential
is not an Input
layer, it is treated as a normal layer, whether it takes a single tensor or a list of tensors denpend on its first layer.
We've cover what users need to know to use existing layers in Keraflow. For advanced information such as writing customized layer, please refer to Developer-Guide
A typical deep learning model does the follows:
Y
from the input tensor X
such that Y
= f(X
), where f is the model with some trainable parameters (e.g. Y
=WX
+b
).Y
is from the gold answer Target
according to a loss function: loss
= L(Y
, Target
).loss
with respect to the trainable parameters: Gw
, Gb
= grad(loss
, W
), grad(loss
, b
)loss
according to some optimizing rule (e.g. subtracting the gradient from the parameters W
, b
= W
-GW
, b
-Gb
)We now introduce Model and Sequential to cover these steps.
Let's start from the simple model with one input tensor:
We tell the model that the input tensor is X
and the output tensor is Y
. In the one-input-one-output case, we could simply use Sequential to avoid naming all the tensors.
Note that Sequential can only be used as a model (able to call fit
, predict
) when its first layer is an Input layer. Moreover, if there are multiple input tensors or multiple output tensors, we could only use Model
:
Now we can specify the loss function and the optimizer. For single output model (including Sequential):
For a model with multiple outputs, we need to specify a loss function for each output channel:
Note that the name of each output channel is the name of the corresponding output layer (output1
, output2
in the example). If you feel that writing these names unnecessary, you could also pass a list to loss
(see Objectives).
In the examples, we pass strings (e.g. sgd
, categorical_crossentropy
). These strings are actually alias of predefined optimizers and loss functions. You could also pass customized optimize/loss function instance (see Optimizers, Objectives).
Both Model and Sequential has
X
to Target
by iteratively updating model parameters.loss
given X
and Target
Y
given X
Both fit
and evaluate
takes X
and Target
as inputs. As for predict
, only X
is required.
For Sequential (single input, single output), X
, Target
takes a single numpy array (or list). For Model, X
, Target
takes a dictionary/list of numpy array(s) (see Argument Passing Summarization).
You could save a (trained) model to disk and restore it for later use. Simply run
Note that the architecture and the model parameters are stored separately. If you don't want to save model weights, set weight_fname
to None
.
Keraflow supports two file formats for storing model architectue: json
and yaml
. Please specify the file extension to switch between these two formats. Note that in addition to arch_fname
and weight_fname
, save_to_file
takes **kwargs
and pass them to json.dump
or yaml.dump
. So, in the above example, the indent
argument is to beautify the architecture output.
To bring both flexibly and convince to argument passing in Keraflow, for some arguments of some functions, users can pass a single value, a dictionary, or a list. We summarize them as follows. For more examples, please refer to Initial Weights, Regularizers, and Constraints.
Related arguments:
initial_weights
x
, y
, sample_weights
Related arguments:
regularizers
, constraints
loss
, metrics
Related arguments:
initial_weights
, regularizers
, constraints
sample_weights
metrics