After publishing my previous blog about making a lookup case insensitive, I received a question from my most loyal reader, my wife. Just like me, she is a BI developer, so that makes her the perfect proofreader. After reading my post, she wondered what would happen if you used a partial cache mode, and the input data didn’t match the casing of the lookup data. So I started investigating…
For this test, our source data will be the following dataset (only showing top 27 records):
Figuur 1- Product Data
We will try to lookup the correct color ID for each product. Using the lookup dataset below:
Figuur 2 – Lookup data
To test the lookup we will be using a very simple SSIS package. The screenshot below shows the layout, and the total number of rows we are processing.
As a baseline, we will use a lookup using “No Cache”. This will force SSIS to Query the SQL server instance for every record in the input. A trace on the SQL server instance shows us that this is indeed the case. Below you can see the bottom rows returned by the trace, but in total there were 2517 incoming calls. So as expected, we are querying the DB once for each input row.
For this test we will perform the lookup on column “ColorName”. This way, the input data and lookup data will match 100% (including casing). If everything works as expected, SSIS would look up each color once, and then store it in the cache. Once again, the trace proves our point. Each color is queried once, and from then on the lookup uses the cache. This results in only 19 calls to the database instance.
And the Log Events (OnInformation) show us that the cache behaved exactly like we expected. 16 records were sent to the Server, the other 2501 rows were matched using the cache. Perfect!
So on to our main event. In this test we use “LowerColorName”, so the input data won’t match the case of the lookup data. So what will the partial lookup do now? The trace shows us a depressing result. Just like with the “No Cache” setting enabled, the database receives 2517 calls.
And the log events confirm this terrible result.
What happened? Shouldn’t the partial cache remember that it already looked up “Blue” and use the cache instead of querying the database? Well yes it should, and it does. What really happened is the following. The first record that arrives, tries to lookup “Silver” in the cache. At that moment the cache is empty, so it can’t find a match there. So SSIS sends the query to the database. The database does a (case insensitive) lookup and returns the value pair (“silver”,7) which is stored in the cache. The real cache uses HASHES to store its data, the table below is a human interpretation of the content of the cache.
Color ID
silver 7
Next we do a lookup for “Blue” and the same thing occurs.
Color ID
silver 7
blue 3
So far, so good. But then we receive another lookup for “Silver”. We already did a lookup for that, so we should be able to find it in the cache. However when SSIS checks the cache for “Silver”, it finds no match. And that is because the lookup in the cache IS case sensitive. So not finding a match in the cache SSIS hands the lookup to the database. And this will be repeated for all 2517 records. So in this instance, the partial cache mode became the worst version of the no cache mode. Both will query the server for each input row and the partial cache will first waste time by looking for it in his cache.
And things might even be worse than that. If we go back to the simplified cache used above, what would happen if we continue our example? So we have “silver” and “blue” already present in the cache.
Color ID
silver 7
blue 3
The next record that arrives is another “Silver” one. SSIS looks for “Silver” in the cache, but can’t find it (remember, the lookup is case sensitive). It queries the server which returns “silver”, 7. That value, however, is already present in the cache. So what will SSIS do? Is it smart enough to realize the duplicate record, or will it add the value again. Unfortunately, the event log doesn’t return the number of records of the size of the cache for partial cache mode. Rather strange, since it does return those values when running in full cache mode.
© 2023 Kohera
Crafted by
© 2022 Kohera
Crafted by
Cookie | Duration | Description |
---|---|---|
ARRAffinity | session | ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user. |
ARRAffinitySameSite | session | This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session. |
cookielawinfo-checkbox-advertisement | 1 year | Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category. |
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
CookieLawInfoConsent | 1 year | CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie. |
elementor | never | The website's WordPress theme uses this cookie. It allows the website owner to implement or change the website's content in real-time. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | Cloudflare set the cookie to support Cloudflare Bot Management. |
pll_language | 1 year | Polylang sets this cookie to remember the language the user selects when returning to the website and get the language information when unavailable in another way. |
Cookie | Duration | Description |
---|---|---|
_ga | 1 year 1 month 4 days | Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_ga_* | 1 year 1 month 4 days | Google Analytics sets this cookie to store and count page views. |
_gat_gtag_UA_* | 1 minute | Google Analytics sets this cookie to store a unique user ID. |
_gid | 1 day | Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously. |
ai_session | 30 minutes | This is a unique anonymous session identifier cookie set by Microsoft Application Insights software to gather statistical usage and telemetry data for apps built on the Azure cloud platform. |
CONSENT | 2 years | YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data. |
vuid | 1 year 1 month 4 days | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website. |
Cookie | Duration | Description |
---|---|---|
ai_user | 1 year | Microsoft Azure sets this cookie as a unique user identifier cookie, enabling counting of the number of users accessing the application over time. |
VISITOR_INFO1_LIVE | 5 months 27 days | YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface. |
YSC | session | Youtube sets this cookie to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt-remote-device-id | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt.innertube::nextId | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
Cookie | Duration | Description |
---|---|---|
WFESessionId | session | No description available. |