{"id":502,"date":"2024-11-21T06:28:06","date_gmt":"2024-11-21T06:28:06","guid":{"rendered":"https:\/\/datadandies.nl\/?p=502"},"modified":"2025-01-12T06:13:39","modified_gmt":"2025-01-12T06:13:39","slug":"data-modeling-problem-events-in-progress-in-human-resources-data-dealing-with-employees-who-change-teams","status":"publish","type":"post","link":"https:\/\/datadandies.nl\/index.php\/2024\/11\/21\/data-modeling-problem-events-in-progress-in-human-resources-data-dealing-with-employees-who-change-teams\/","title":{"rendered":"Data modeling problem: Events in Progress in Human Resources data &#8211; Dealing with employees who change teams"},"content":{"rendered":"\n<p>I have a data modeling conundrum. It concerns records in a dimension that have start- and enddates.<\/p>\n\n\n\n<p>I\u2019ve heard people refer to this type of problem as the \u201cevents in progress\u201d-problem. My specific version of this problem goes like this:<\/p>\n\n\n\n<p>Imagine you have a sales fact table called FactSales with a date, an employee who made the sale and the quantity that was sold. It would look like this:<\/p>\n\n\n\n<p>Date | Employee | Sales<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"445\" height=\"204\" src=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-7.png\" alt=\"\" class=\"wp-image-505\" srcset=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-7.png 445w, https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-7-300x138.png 300w\" sizes=\"auto, (max-width: 445px) 100vw, 445px\" \/><\/figure>\n\n\n\n<p>The fact table above can be joined with a employee dimension called DimEmployee. This dimension looks like this:<\/p>\n\n\n\n<p>Employee | Team | TeamStartDate | TeamEndDate<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"718\" height=\"151\" src=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-6.png\" alt=\"\" class=\"wp-image-504\" srcset=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-6.png 718w, https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-6-300x63.png 300w\" sizes=\"auto, (max-width: 718px) 100vw, 718px\" \/><\/figure>\n\n\n\n<p>Using SQL, you could join FactSales with DimEmployee table using the column \u201cEmployee\u201d.<\/p>\n\n\n\n<p>However, sometimes employees have more than one record in DimEmployee. This would lead to unwanted doubling of sales records. You can bypass this problem in SQL by using a multi-column join like below:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"SELECT * FROM FactSales\n\nLEFT JOIN DimEmployee ON FactSales.Employee = DimEmployee.Employee\n\n                      AND FactSales.Date BETWEEN DimEmployee.TeamStartDate AND DimEmployee.TeamEndDate\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Kopieer\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #81A1C1\">SELECT<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">*<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">FROM<\/span><span style=\"color: #D8DEE9FF\"> FactSales<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">LEFT JOIN<\/span><span style=\"color: #D8DEE9FF\"> DimEmployee <\/span><span style=\"color: #81A1C1\">ON<\/span><span style=\"color: #D8DEE9FF\"> FactSales.Employee <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> DimEmployee.Employee<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">                      <\/span><span style=\"color: #81A1C1\">AND<\/span><span style=\"color: #D8DEE9FF\"> FactSales.Date <\/span><span style=\"color: #81A1C1\">BETWEEN<\/span><span style=\"color: #D8DEE9FF\"> DimEmployee.TeamStartDate <\/span><span style=\"color: #81A1C1\">AND<\/span><span style=\"color: #D8DEE9FF\"> DimEmployee.TeamEndDate<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>This would lead to One Big Table with the correct team in each record.<\/p>\n\n\n\n<p>However, imagine you are not using SQL, but you are using Power BI. You want to create a report showing the amount of sales per employee, including in which team the employee was at the time of the sale. A matrix visual in Power BI could look like this:<\/p>\n\n\n\n<p>Team | Employee | Sales<\/p>\n\n\n\n<p>1. Using Power BI however, we run into a problem. If we were to load the tables as they are, a many-to-many relationship would be needed. I try to avoid many-to-many relationships in my semantic models because they could lead to ambiguity. Now, as I see it, there are several approaches to solve this problem:<\/p>\n\n\n\n<p>You filter the DimEmployee table: you only take the most recent records. This solution is far from ideal because now, only the most recent team will be present for each employee; NOT the team that the employee was a part of at the time of the sale. Not a big fan of this solution, but I\u2019ve seen it too many times in the wild to not mention it.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"606\" height=\"453\" src=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-8.png\" alt=\"\" class=\"wp-image-506\" srcset=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-8.png 606w, https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-8-300x224.png 300w, https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-8-380x285.png 380w, https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-8-285x214.png 285w\" sizes=\"auto, (max-width: 606px) 100vw, 606px\" \/><\/figure>\n\n\n\n<p>2. You create a composite key. First you extend DimEmployee by creating a record for each day between start- and enddate that an employee was a part of a certain team. Then you create a new column containing a concatenation of the new column and the column \u201cEmployee\u201d. You create the same new composite key column (concatenation of columns Date and Employee) in the FactSales table. Then, you create a relationship between the two newly minted columns. This approach could work, but you table will start exploding when there are a lot dates between the start- and enddate. Especially when you use dates like 9999-12-31 as enddates for current statuses.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"606\" height=\"530\" src=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-10.png\" alt=\"\" class=\"wp-image-508\" srcset=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-10.png 606w, https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-10-300x262.png 300w\" sizes=\"auto, (max-width: 606px) 100vw, 606px\" \/><\/figure>\n\n\n\n<p>3. You denormalize the data. You basically do the same thing in Power Query as we did using the SQL statement above.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"533\" height=\"520\" src=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-11.png\" alt=\"\" class=\"wp-image-509\" srcset=\"https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-11.png 533w, https:\/\/datadandies.nl\/wp-content\/uploads\/2024\/11\/image-11-300x293.png 300w\" sizes=\"auto, (max-width: 533px) 100vw, 533px\" \/><\/figure>\n\n\n\n<p>Now if you are using another approach than the ones listed above, I would love to hear about it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have a data modeling conundrum. It concerns records in a dimension that have start- and enddates. I\u2019ve heard people refer to this type of problem as the \u201cevents in progress\u201d-problem. My specific version of this problem goes like this: Imagine you have a sales fact table called FactSales with a date, an employee who&hellip;<\/p>\n<p class=\"more-link\"><a href=\"https:\/\/datadandies.nl\/index.php\/2024\/11\/21\/data-modeling-problem-events-in-progress-in-human-resources-data-dealing-with-employees-who-change-teams\/\" class=\"themebutton\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":504,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[16,55,3,4],"class_list":["post-502","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-datamodeling","tag-onebigtable","tag-powerbi","tag-sql"],"_links":{"self":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/502","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/comments?post=502"}],"version-history":[{"count":2,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/502\/revisions"}],"predecessor-version":[{"id":586,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/502\/revisions\/586"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/media\/504"}],"wp:attachment":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/media?parent=502"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/categories?post=502"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/tags?post=502"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}