TARGET functions provide a way to combine values from two different rows of the activity table
into the same row, e.g. for calculating the throughput time between consecutive events inside a case.
In process mining applications it is often required to relate an event to another event which directly or eventually follows. For instance, this is required to compute the throughput time between two events by calculating the difference between the corresponding timestamps. Due to the relational data model, the timestamp values to subtract are stored in different rows. However, the operators (e.g. arithmetic operations) usually can only combine values from the same row. Therefore, we need a way to combine values from two different rows into the same row for performing such computations.
To overcome this issue, PQL relies on the
Adds the values from a column, which correspond to a source activity, to a temporary table based on the optional filter column and the optional edge configuration:
Adds the values from a column, which correspond to a target activity, to a temporary table based on the optional filter column and the optional edge configuration:
- activity_table.column: Column of the activity table. Its values are mapped to the referred events and returned as result column. The result column is stored in a temporary result table which can be joined with the case table.
- activity_table.filter_column: Optional filter column to skip certain events. Events with a NULL value in the related entry of the filter column are ignored. Usually, the filter column is created using REMAP_VALUES. See below for detailed explanation.
- edge_configuration: Describes which edges between the activities of a case are considered. Must be one of:
- ANY_OCCURRENCE TO ANY_OCCURRENCE (DEFAULT)
- FIRST_OCCURRENCE TO ANY_OCCURRENCE
- FIRST_OCCURRENCE TO ANY_OCCURRENCE_WITH_SELF
- ANY_OCCURRENCE TO LAST_OCCURRENCE
- FIRST_OCCURRENCE TO LAST_OCCURRENCE
The following example illustrates how
TARGET can be used to compute the throughput time
between an event and its direct successor. While
SOURCE always refers to the actual event,
TARGET refers to its following
event inside the case. Such a connection is called an edge.
TARGET can be used to combine an event with its following event in the same row of a table.
Both operators accept a column of the activity table as input and return the respective value of the referred event,
as illustrated in the following example.
Besides the computation of custom process KPIs, like the throughput time between certain activities,
also enable more advanced use cases, like the Segregation of Duties.
Rows from activity_table.column, which are NULL, are also NULL in the output column. The SOURCE / TARGET operators ignore rows in which the value of the filter column are NULL.
To skip certain events, the
TARGET operators accept an optional filter column as a parameter.
This column must be of the same size as the activity table. The
TARGET operators ignore all events that
have a NULL value in the related entry of the filter column. Usually, the filter column is created using the
To define which relationships between the events should be considered, the operators offer the optional edge configuration parameter. The following edge configuration options are available:
(a) ANY_OCCURRENCE TO ANY_OCCURRENCE (DEFAULT)
(b) FIRST_OCCURRENCE TO ANY_OCCURRENCE
(c) FIRST_OCCURRENCE TO ANY_OCCURRENCE_WITH_SELF
(d) ANY_OCCURRENCE TO LAST_OCCURRENCE
(e) FIRST_OCCURRENCE TO LAST_OCCURRENCE
The first option (a) is the default and only considers the direct follow relationships between the events, while option (b) only considers relationships from the first event to all subsequent events. Option (c) is similar to option (b) but also considers self-loops of the first event. Option (d) is the opposite of option (b) and only considers relationships going from any event to the last event. Finally, option (e) only considers the relationship between the first and the last event. The different options enable the user to compute KPIs between different activities of the process. For example, you can use option (b) to compute how many minutes after the start of the process (indicated by the first activity 'A') an activity was executed. This is illustrated in the following example:
To simplify the query, the optional edge configuration and the filter column need to be defined in only one occurrence
TARGET per query. The settings are implicitly propagated to all other operators
in the same query. This can be seen in the examples above, where all
inherit the filter column or the edge configuration from the first
SOURCE operator of the query.
TARGET generate temporary edge tables. A separate edge table is created for each unique edge and filter
column configuration inside a query.
TARGET operators with identical edge configuration and filter column
add columns to the same edge table. The edge tables are joined to the corresponding case table.
In this example several calls to the
TARGET operators are done. After each call the status of the edge tables is shown.
SOURCEis called with the
ANY_OCCURRENCE TO ANY_OCCURRENCEedge configuration. A temporary edge table is created and the
SOURCEcolumn is added to it:
TARGETis also called with the
ANY_OCCURRENCE TO ANY_OCCURRENCEedge configuration. As this is the same configuration as in the previous
TARGETcolumn is added to the existing edge table:
- This column is added to a new temporary table, because the configuration is different to the previous ones:
SOURCEis called with a filter column configuration. The
Filtercould be a call to REMAP_VALUES as shown above. Although the edge configuration
ANY_OCCURRENCE TO ANY_OCCURRENCEis the same as in the calls before, the overall configuration differs because of the additional filter column. Therefore, a new temporary edge table is created, and the
SOURCEcolumn is added to it:
Multiple temporary edge tables can lead to problems if used within the same query. This happens if
with different edge and filter configurations are used within the same query. The edge tables cannot be joined
together by default because they do not have a common parent. This behavior is described in
Join functionality in more detail.
The only way of having multiple different configurations inside the same query is to aggregate values of the temporary edge tables to the case table using PU functions, as shown in the following example:
As described above, the temporary edge tables are joined to the case table. This means that if a case is filtered out in the case table, all activities related to that case in the temporary edge tables are also filtered out. This also holds vice versa: if all activities related to a case are filtered out in the edge table, the case itself is also filtered out in the case table. If not all edges of a case are filtered out, the case stays in the case table. Filter Propagation describes the filter propagation behavior in more detail.