内容简介:The way you create and access your data can affect your app’s performance. Let’s see how.JavaScript is a very high-level language. As such, most developers don’t tend to think a lot about how data is stored in memory. In this article, we will go over how d
The way you create and access your data can affect your app’s performance. Let’s see how.
Intro
JavaScript is a very high-level language. As such, most developers don’t tend to think a lot about how data is stored in memory. In this article, we will go over how data is stored in the memory, how it affects the CPU and memory and how the way you distribute and access your data in JS affects performance.
The Romantic Triangle
When the computer needs to do some computation, the processing unit (the CPU) needs data to process. So, according to the task at hand, it sends a request to the memory to fetch the data via the bus.
It looks like this:
So this is our romantic triangle -
CPU -> Bus -> Memory
Keeping a Healthy Relationship in a Trio is Hard…
The CPU is much faster than the memory. So this process of CPU -> BUS -> Memory -> BUS -> CPU “wastes” computation time. While memory is being looked up, the CPU sits idle and cannot do anything else.
In order to prevent idle time, a cache was added to the system. We won’t go into much detail about cache or cache types, but suffice to say that the cache is the CPU’s internal memory.
When the CPU gets orders to run, it first searches the cache for the data, and if the data isn’t there, it sends a request via the bus.
The bus, in turn, brings the requested data plus a contingent portion of memory and stores it in the cache for quick reference.
This way, the CPU will have fewer complaints about how slow the memory is and hence less CPU idle time.
Quarrels are a Part of Every Relationship
A problem that might arise — especially when we are dealing with processing massive amounts of data — is a phenomenon called “ cache miss” .
Cache missmeans that during a computation, the CPU finds out it doesn’t have the necessary data in the cache, and needs to request this data via the regular channels — you know, that sloooow memory.
An illustration of a cache miss. Data from an array is being processed, but data outside of cache limit is also requested for the computation — and this creates a “cache miss”
Ok — but I’m a JavaScript developer, why should I care?
Good question. In most cases, you wouldn’t. Today though, as more and more data is pouring into Node.js servers and even unfortunate rich clients , you have more chance of getting into a cache miss issue while iterating over large data sets.
I’ll believe it when I see it!!!
Fair enough. Let’s view an example.
Here’s a class called Boom .
This class ( Boom ) has just 3 properties — id , x, and y.
Now, let’s create a method that populates x and y.
Let’s build the setup:
Now we will use this setup in a method:
What local access does is go over the array linearly and sets x to be 0.
If we repeat this function 100 times (look at the repeats constant in the setup), we can measure how long it takes to run:
The log output is this:
The Price to Pay for Missing Cache
Now, according to what we learned above, we can cause cache misses if we process data that is far away during the iteration. Far away data is data that is not in adjacent index. Here’s the function for this:
What happens here is, that on every iteration we address the index that is ROWS away from the last iteration. So if ROWS is 1000 (as in our case), we get the following iteration: [0, 1000, 2000, … , 1, 1001, 2001, …].
Let’s add it to our speed test:
And here’s the end result:
The non-local iteration was almost 4 times slower. This difference will grow with bigger data. It happens because of the time that the CPU sits idle due to cache misses.
So what is the price you pay? It all depends on the size of your data.
Ok, I swear I will never do that!
You might not think so, but… there are cases in which you’d like to access an array with some logic that is not linear (e.g. 1,2,3,4,5) or not contingent (e.g. for (let i = 0; i < n; i+=1000) ).
For instance, you get data from a service or the DB, and need to process it sorted or filtered by some complex logic. This might cause the data to be accessed similarly to what was shown in the farAccess function.
Here’s an illustration that will (hopefully) make it clearer:
Looking at the image above, we see the data as it is stored in memory (top gray bar). Below, we see the array that was created when the data arrived from the server. Finally, we see the sorted array that holds references to objects stored in various positions in memory.
This way, iterating over the sorted array, can cause the multiple cache misses seen in the example above.
Note that this example is for a small array. Cache misses are relevant for much bigger data.
In a world where you need slick animations in the frontend and where you can be charged for every millisecond of CPU time in the backend (serverless or other) — this can become critical.
Oh no! All is lost!!!
No, not really.
There are various solutions to this problem, but now that you know the reason for this performance hit, you can probably think of a solution yourself. You just need to store data that is processed together closer in memory.
Such techniques are called the Data Locality design pattern .
Let’s continue our example. Assume that in our app the most common process is going over the data with the logic shown in the farAccess function. We’d like to optimize the data so that it will run fast in the most common for loop .
We will arrange the same data like this:
So now, in diffArr , the objects that are in indices [0,1,2, …] in the original array, are now set like this [0,1000,2000, … , 1, 1001, 2001, …, 2, 1002, 2002, …]. The numbers denote the index of the object. This simulates sorting the array — which is one way to implement the Data Locality design pattern.
In order to easily test this, we will change our farAccess function a bit to get a custom array:
And now add the scenario to our test:
We run this, and we get:
And voila! — we’ve optimized our data to fit the more common way we need to look at it.
The full example is live at https://yonatankra.com/performance/memory/liveExamples/objectPool2/index.html
Summary
In this article, we demonstrated the Data Locality design pattern. In essence, we showed that accessing a data structure in a way it is not optimized for, may cost us in performance. We then optimized the data to fit our way of processing it — and we saw how it improved performance.
The Data Locality design pattern used to be the domain of game developers, who had to deal with a lot of entities that are iterated over a lot of times. In the results above we see that data locality still matters even in high-level languages like JavaScript — and not just in gaming.
Today, with the amount of data being transferred between servers or even pushed into browsers, design patterns that used to be the day-to-day of game developers , should be considered by app developers — whether server-side or client-side.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
维多利亚时代的互联网
[英] 汤姆·斯丹迪奇 / 多绥婷 / 后浪丨江西人民出版社 / 2017-8 / 38.00元
人类历史上的第一次大连接 回顾互联网的前世 预言互联网的未来 ……………… ※编辑推荐※ ☆《财富》杂志推荐的75本商务人士必读书之一 ☆ 回顾互联网的前世,颠覆你的思维,升级你对互联网的认知 ☆ 人类历史上一次全球大连接是维多利亚时期的电报时代,那时候也有疯狂的资本、 巨大的泡沫、网络新型犯罪、网络亚文化崛起……现在的互联网时代就是电报时代的重演;回顾那......一起来看看 《维多利亚时代的互联网》 这本书的介绍吧!