Applies to:  CELONIS 4.6 

Description

This example shows how to identify and perform calculations on activity sequences containing more than two activities with PQL.

Identifying recurrent sequences containing e.g. unnecessary rework is an important requirement to ensure a smooth process handling. For example, the smooth processing of the purchase approval workflow is crucial to ensure a prompt delivery of ordered materials. As purchase orders often have to pass extensive approval procedures, the avoidance of unnecessary rework steps associated with the release process are therefore essential.

Challenge

For each case, show the occurring activity sequences that contain the activities 'Approval requested', 'Change PO for approval requested' and 'PO changed'.

The number of occurrences per case should be shown in a separate column.

The average throughput time for the activity sequence per case should be shown in a separate column.

Solution


[1] The activity sequences can be identified using the following query. In this example, case 1 and case 3 contain the specific activity sequence ('Approval requested' followed by 'Change PO for approval request' followed by 'PO changed'), and case 2 does not:

Query
Filter
FILTER "Activities"."Activity" = 'Approval requested' AND ACTIVITY_LEAD("Activities"."Activity") = 'Change of PO for approval requested' AND ACTIVITY_LEAD("Activities"."Activity",2) = 'PO changed'
Column1
"Activities"."Case"
Column2
"Activities"."Activity" || ' - ' || ACTIVITY_LEAD("Activities"."Activity") || ' - ' || ACTIVITY_LEAD("Activities"."Activity",2)
Column3
COUNT("Activities"."Activity")
Column4
AVG(DATEDIFF(hh, "Activities"."Timestamp", ACTIVITY_LEAD("Activities"."Timestamp",2)))
Input
Activities
Case : INTActivity : STRINGTimestamp : DATE
1
'PO created'
Mon Feb 01 2016 01:00:00.000
1
'Approval requested'
Mon Feb 01 2016 02:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
1
'PO changed'
Mon Feb 01 2016 04:00:00.000
1
'Approval requested'
Mon Feb 01 2016 05:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 06:00:00.000
1
'PO changed'
Mon Feb 01 2016 07:00:00.000
1
'Approval requested'
Mon Feb 01 2016 08:00:00.000
1
'Approval granted'
Mon Feb 01 2016 09:00:00.000
2
'PO created'
Mon Feb 01 2016 01:00:00.000
2
'Approval requested'
Mon Feb 01 2016 02:00:00.000
2
'Approval granted'
Mon Feb 01 2016 03:00:00.000
3
'PO created'
Mon Feb 01 2016 01:00:00.000
3
'Approval requested'
Mon Feb 01 2016 02:00:00.000
3
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
3
'PO changed'
Mon Feb 01 2016 04:00:00.000
3
'Approval requested'
Mon Feb 01 2016 05:00:00.000
3
'Approval granted'
Mon Feb 01 2016 06:00:00.000
Output
Result
Column1 : INTColumn2 : STRINGColumn3 : INTColumn4 : FLOAT
1
'Approval requested - Change of PO for approval requested - PO changed'
2
2.0
3
'Approval requested - Change of PO for approval requested - PO changed'
1
2.0


Step-by-Step Solution

In order to find (specific) activity sequences, we need to compare each activity with the following activities. This can be done with the ACTIVITY_LEAD operator. Now let's take a closer look on this operator.


[2] To be able to compare the current activity with the next and the second next activity, we need to have them in one line. We can do so by using ACTIVITY_LEAD:

Query
Column1
"Activities"."Case"
Column2
"Activities"."Activity"
Column3
ACTIVITY_LEAD("Activities"."Activity")
Column4
ACTIVITY_LEAD("Activities"."Activity",2)
Input
Activities
Case : INTActivity : STRINGTimestamp : DATE
1
'PO created'
Mon Feb 01 2016 01:00:00.000
1
'Approval requested'
Mon Feb 01 2016 02:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
1
'PO changed'
Mon Feb 01 2016 04:00:00.000
1
'Approval requested'
Mon Feb 01 2016 05:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 06:00:00.000
1
'PO changed'
Mon Feb 01 2016 07:00:00.000
1
'Approval requested'
Mon Feb 01 2016 08:00:00.000
1
'Approval granted'
Mon Feb 01 2016 09:00:00.000
2
'PO created'
Mon Feb 01 2016 01:00:00.000
2
'Approval requested'
Mon Feb 01 2016 02:00:00.000
2
'Approval granted'
Mon Feb 01 2016 03:00:00.000
3
'PO created'
Mon Feb 01 2016 01:00:00.000
3
'Approval requested'
Mon Feb 01 2016 02:00:00.000
3
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
3
'PO changed'
Mon Feb 01 2016 04:00:00.000
3
'Approval requested'
Mon Feb 01 2016 05:00:00.000
3
'Approval granted'
Mon Feb 01 2016 06:00:00.000
Output
Result
Column1 : INTColumn2 : STRINGColumn3 : STRINGColumn4 : STRING
1
'PO created''Approval requested''Change of PO for approval requested'
1
'Approval requested''Change of PO for approval requested''PO changed'
1
'Change of PO for approval requested''PO changed''Approval requested'
1
'PO changed''Approval requested''Change of PO for approval requested'
1
'Approval requested''Change of PO for approval requested''PO changed'
1
'Change of PO for approval requested''PO changed''Approval requested'
1
'PO changed''Approval requested''Approval granted'
1
'Approval requested''Approval granted'null
1
'Approval granted'nullnull
2
'PO created''Approval requested''Approval granted'
2
'Approval requested''Approval granted'null
2
'Approval granted'nullnull
3
'PO created''Approval requested''Change of PO for approval requested'
3
'Approval requested''Change of PO for approval requested''PO changed'
3
'Change of PO for approval requested''PO changed''Approval requested'
3
'PO changed''Approval requested''Approval granted'
3
'Approval requested''Approval granted'null
3
'Approval granted'nullnull



[3] In order to get the specific activity sequences starting with activity 'Approval requested', we are applying a filter to the table:

Query
Filter
FILTER "Activities"."Activity" = 'Approval requested'
Column1
"Activities"."Case"
Column2
"Activities"."Activity"
Column3
ACTIVITY_LEAD("Activities"."Activity")
Column4
ACTIVITY_LEAD("Activities"."Activity",2)
Input
Activities
Case : INTActivity : STRINGTimestamp : DATE
1
'PO created'
Mon Feb 01 2016 01:00:00.000
1
'Approval requested'
Mon Feb 01 2016 02:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
1
'PO changed'
Mon Feb 01 2016 04:00:00.000
1
'Approval requested'
Mon Feb 01 2016 05:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 06:00:00.000
1
'PO changed'
Mon Feb 01 2016 07:00:00.000
1
'Approval requested'
Mon Feb 01 2016 08:00:00.000
1
'Approval granted'
Mon Feb 01 2016 09:00:00.000
2
'PO created'
Mon Feb 01 2016 01:00:00.000
2
'Approval requested'
Mon Feb 01 2016 02:00:00.000
2
'Approval granted'
Mon Feb 01 2016 03:00:00.000
3
'PO created'
Mon Feb 01 2016 01:00:00.000
3
'Approval requested'
Mon Feb 01 2016 02:00:00.000
3
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
3
'PO changed'
Mon Feb 01 2016 04:00:00.000
3
'Approval requested'
Mon Feb 01 2016 05:00:00.000
3
'Approval granted'
Mon Feb 01 2016 06:00:00.000
Output
Result
Column1 : INTColumn2 : STRINGColumn3 : STRINGColumn4 : STRING
1
'Approval requested''Change of PO for approval requested''PO changed'
1
'Approval requested''Change of PO for approval requested''PO changed'
1
'Approval requested''Approval granted'null
2
'Approval requested''Approval granted'null
3
'Approval requested''Change of PO for approval requested''PO changed'
3
'Approval requested''Approval granted'null



[4] In the next step, we want only activity sequences starting with activity 'Approval requested' followed by activity 'Change of PO for approval requested' and ending with activity 'PO changed' in the second next activity. Therefore, we are applying an additional filter to the table:

Query
Filter
FILTER "Activities"."Activity" = 'Approval requested' AND ACTIVITY_LEAD("Activities"."Activity") = 'Change of PO for approval requested' AND ACTIVITY_LEAD("Activities"."Activity",2) = 'PO changed'
Column1
"Activities"."Case"
Column2
"Activities"."Activity"
Column3
ACTIVITY_LEAD("Activities"."Activity")
Column4
ACTIVITY_LEAD("Activities"."Activity",2)
Input
Activities
Case : INTActivity : STRINGTimestamp : DATE
1
'PO created'
Mon Feb 01 2016 01:00:00.000
1
'Approval requested'
Mon Feb 01 2016 02:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
1
'PO changed'
Mon Feb 01 2016 04:00:00.000
1
'Approval requested'
Mon Feb 01 2016 05:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 06:00:00.000
1
'PO changed'
Mon Feb 01 2016 07:00:00.000
1
'Approval requested'
Mon Feb 01 2016 08:00:00.000
1
'Approval granted'
Mon Feb 01 2016 09:00:00.000
2
'PO created'
Mon Feb 01 2016 01:00:00.000
2
'Approval requested'
Mon Feb 01 2016 02:00:00.000
2
'Approval granted'
Mon Feb 01 2016 03:00:00.000
3
'PO created'
Mon Feb 01 2016 01:00:00.000
3
'Approval requested'
Mon Feb 01 2016 02:00:00.000
3
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
3
'PO changed'
Mon Feb 01 2016 04:00:00.000
3
'Approval requested'
Mon Feb 01 2016 05:00:00.000
3
'Approval granted'
Mon Feb 01 2016 06:00:00.000
Output
Result
Column1 : INTColumn2 : STRINGColumn3 : STRINGColumn4 : STRING
1
'Approval requested''Change of PO for approval requested''PO changed'
1
'Approval requested''Change of PO for approval requested''PO changed'
3
'Approval requested''Change of PO for approval requested''PO changed'



[5] Next, we want to count the occurrences of the specific activity sequence in the table. To do so, we are using a COUNT aggregation and additionally concatenate the columns with activity names:

Query
Filter
FILTER "Activities"."Activity" = 'Approval requested' AND ACTIVITY_LEAD("Activities"."Activity") = 'Change of PO for approval requested' AND ACTIVITY_LEAD("Activities"."Activity",2) = 'PO changed'
Column1
"Activities"."Case"
Column2
"Activities"."Activity" || ' - ' || ACTIVITY_LEAD("Activities"."Activity") || ' - ' || ACTIVITY_LEAD("Activities"."Activity",2)
Column3
COUNT("Activities"."Activity")
Input
Activities
Case : INTActivity : STRINGTimestamp : DATE
1
'PO created'
Mon Feb 01 2016 01:00:00.000
1
'Approval requested'
Mon Feb 01 2016 02:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
1
'PO changed'
Mon Feb 01 2016 04:00:00.000
1
'Approval requested'
Mon Feb 01 2016 05:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 06:00:00.000
1
'PO changed'
Mon Feb 01 2016 07:00:00.000
1
'Approval requested'
Mon Feb 01 2016 08:00:00.000
1
'Approval granted'
Mon Feb 01 2016 09:00:00.000
2
'PO created'
Mon Feb 01 2016 01:00:00.000
2
'Approval requested'
Mon Feb 01 2016 02:00:00.000
2
'Approval granted'
Mon Feb 01 2016 03:00:00.000
3
'PO created'
Mon Feb 01 2016 01:00:00.000
3
'Approval requested'
Mon Feb 01 2016 02:00:00.000
3
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
3
'PO changed'
Mon Feb 01 2016 04:00:00.000
3
'Approval requested'
Mon Feb 01 2016 05:00:00.000
3
'Approval granted'
Mon Feb 01 2016 06:00:00.000
Output
Result
Column1 : INTColumn2 : STRINGColumn3 : INT
1
'Approval requested - Change of PO for approval requested - PO changed'
2
3
'Approval requested - Change of PO for approval requested - PO changed'
1



[6] In the last step, we additionally want to calculate the average throughput time for the activity sequences per case. For this purpose, we are using the AVG aggregation around the DATEDIFF function in a separate column:

Query
Filter
FILTER "Activities"."Activity" = 'Approval requested' AND ACTIVITY_LEAD("Activities"."Activity") = 'Change of PO for approval requested' AND ACTIVITY_LEAD("Activities"."Activity",2) = 'PO changed'
Column1
"Activities"."Case"
Column2
"Activities"."Activity" || ' - ' || ACTIVITY_LEAD("Activities"."Activity") || ' - ' || ACTIVITY_LEAD("Activities"."Activity",2)
Column3
COUNT("Activities"."Activity")
Column4
AVG(DATEDIFF(hh, "Activities"."Timestamp", ACTIVITY_LEAD("Activities"."Timestamp",2)))
Input
Activities
Case : INTActivity : STRINGTimestamp : DATE
1
'PO created'
Mon Feb 01 2016 01:00:00.000
1
'Approval requested'
Mon Feb 01 2016 02:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
1
'PO changed'
Mon Feb 01 2016 04:00:00.000
1
'Approval requested'
Mon Feb 01 2016 05:00:00.000
1
'Change of PO for approval requested'
Mon Feb 01 2016 06:00:00.000
1
'PO changed'
Mon Feb 01 2016 07:00:00.000
1
'Approval requested'
Mon Feb 01 2016 08:00:00.000
1
'Approval granted'
Mon Feb 01 2016 09:00:00.000
2
'PO created'
Mon Feb 01 2016 01:00:00.000
2
'Approval requested'
Mon Feb 01 2016 02:00:00.000
2
'Approval granted'
Mon Feb 01 2016 03:00:00.000
3
'PO created'
Mon Feb 01 2016 01:00:00.000
3
'Approval requested'
Mon Feb 01 2016 02:00:00.000
3
'Change of PO for approval requested'
Mon Feb 01 2016 03:00:00.000
3
'PO changed'
Mon Feb 01 2016 04:00:00.000
3
'Approval requested'
Mon Feb 01 2016 05:00:00.000
3
'Approval granted'
Mon Feb 01 2016 06:00:00.000
Output
Result
Column1 : INTColumn2 : STRINGColumn3 : INTColumn4 : FLOAT
1
'Approval requested - Change of PO for approval requested - PO changed'
2
2.0
3
'Approval requested - Change of PO for approval requested - PO changed'
1
2.0


  • No labels