Wednesday, January 11, 2012

Understanding selectAll, data, enter, append sequence in D3.js

If you are new to D3.js and have looked at the various D3.js examples on the web to learn it, you have most probably come across a sequence of selectAll(), data(), enter() and append() statements as shown in Example 1 below. To a newcomer to D3.js, it is not obvious how these methods work. At least, initially, I did not find it easy to understand their functioning. If you are also having some trouble with understanding these methods and how they work together, I think the examples and explanations below will be helpful.

The key to understanding the working of these methods is in the following paragraph in Data Keys section on http://mbostock.github.com/d3/

"The key function also determines the enter and exit selections: the new data for which there is no corresponding key in the old data become the enter selection, and the old data for which there is no corresponding key in the new data become the exit selection. The remaining data become the default update selection."
and in Mike Bostock's response in this thread.

Don't worry if the above excerpt makes little sense at this point. Once you look at the examples and the explanation below, the meaning will become clearer. In all the examples below, we have used D3.js to create HTML paragraph elements. But the same principles apply for any other elements you create using D3.js.

Carefully study the following two examples and their outputs. Try to spot the difference between these examples and their outputs.




Example 1
 <!DOCTYPE html>
 <html>
 <head>
     <script type="text/javascript"
                src="http://mbostock.github.com/d3/d3.js">
     </script>
     <style type="text/css">
         p {
             font-size: 20px;
             color: orangered;
         }
     </style>
 </head>
 <body>
     <div id="example1"></div>
     <script type="text/javascript">
         pdata = [10,12,6,8,15];
 
         selectDIV = d3.select("#example1");
 
        selectDIV.selectAll("p")
             .data(pdata)
             .enter()
             .append("p")
             .text(function(d){return d;});
     </script>
 </body>
 </html>
Output of Example 1

Example 2
 <!DOCTYPE html>
 <html>
 <head>
     <script type="text/javascript" 
                src="http://mbostock.github.com/d3/d3.js"
     ></script>
     <style type="text/css">
         p {
             font-size: 20px;
             color: orangered;
         }
     </style>
 </head>
 <body>
     <div id="example2">
        <p>Already existing paragraph 1</p>
        <p>Already existing paragraph 2</p>
     </div>
     <script type="text/javascript">
         pdata = [10,12,6,8,15];
 
         selectDIV = d3.select("#example2");
 
        selectDIV.selectAll("p")
             .data(pdata)
             .enter()
             .append("p")
             .text(function(d){return d;});
     </script>
 </body>
 </html>
Output of Example 2

Already existing paragraph 1

Already existing paragraph 2


The only difference in the code in the two examples is the presence of two paragraph elements in Example 2 ("Already existing paragraph 1" and "Already existing paragraph 2") before the d3.js script is called. In the output of Example 2, you can see that numbers 10 and 12 do not show up, even thought they are present in the data array pdata. Why? This is what we look at now. Let us go through the script.

  1. Line selectDIV.selectAll("p") returns an array of all <p>...</p> elements in selectDIV.

    In Example 1 this array is empty. In Example 2, this array contains the two already existing paragraph elements.

  2. The line .data(pdata) specifies two things:
    (i) the data array and,
    (ii) the key function which assigns keys to the elements of the data array. These keys determine the enter selection — you can think of the enter selection as the elements in the data array that will be bound to nodes that you specify with the .enter().append() methods.

    You are probably thinking, "wait! I do not see any key function. And, how do the keys determine the enter selection?" Let us address these thoughts.

    As the creator of D3.js has said in this post, when a key function is not explicitly specified as an argument to the .data() method, it takes the default value of the index function. That is, the default key of an element in a data array is its index in the data array. For instance, since a key function has not been explicitly specified in Example 1, keys of elements 10, 12, 6, 8 and 15 are 0, 1, 2, 3, and 4 respectively.

    The keys determine the enter selection in the following manner: the keys of elements in the specified data array (which, in our examples is pdata) are compared with keys of the elements in the existing selection (the selection returned by .selectAll("p") method). Any element in the specified data array whose key is different from keys of all the the existing elements, becomes a part of the enter selection. If the key of a new element matches the key of one of the existing elements then it is NOT a part of the enter selection.

    Neither example above specifies an explicit key function. Hence, each elements key is its index in the array. In Example 1, selectAll("p") returns an empty array; there are no existing elements and consequently set of keys of existing elements is empty. The keys of elements in pdata, i.e., elements 10, 12, 6, 8 and 15 are 0, 1, 2, 3, and 4 respectively. Since all these have keys that are different from the keys of all the existing elements, they all become part of the enter selection.

    On the other hand, in Example 2, .selectAll("p") returns an array with two elements with keys 0 and 1 respectively. The keys of elements in pdata, i.e., elements 10, 12, 6, 8 and 15 are 0, 1, 2, 3, and 4 respectively.

    Since elements 10 and 12 have the same keys as keys of elements in the existing selection, 10 and 12 do not become part of the enter selection.
  3. Methods .enter.append("p") create as many <p>...</p> elements as the number of elements in the enter selection (see the previous step). The argument of .append() specifies the type of element to be created. Finally, .text() line adds the members of the enter selection to the paragraph elements created by .enter.append("p").

    In Example 2, since elements 10 and 12 are not part of the enter selection, no paragraph elements are created for them and they do not show up in the output.

Example 3 below is similar to Example 2. The only difference is that a key function is explicitly specified. Due to this, all elements in the pdata array show up in the output.


Example 3
 <!DOCTYPE html>
 <html>
 <head>
     <script type="text/javascript" 
                src="http://mbostock.github.com/d3/d3.js"
     ></script>
     <style type="text/css">
         p {
             font-size: 20px;
             color: orangered;
         }
     </style>
 </head>
 <body>
     <div id="example3">
        <p>Already existing paragraph 1</p>
        <p>Already existing paragraph 2</p>
     </div>
     <script type="text/javascript">
         pdata = [10,12,6,8,15];
 
         selectDIV = d3.select("#example3");
 
        selectDIV.selectAll("p")
             .data(pdata, function(d){return d;})
             .enter()
             .append("p")
             .text(function(d){return d;});
     </script>
 </body>
 </html>
Output of Example 3

Already existing paragraph 1

Already existing paragraph 2


Note the second argument in the .data() method in Example 3. We have specified the identity function as the key function. As a result, the keys associated with the elements are the values of the elements: Element 10 has a key of 10, 12 has a key of 12, 6 has a key of 6 and so on. None of these keys match the keys of elements in the existing selection, which consists of paragraph elements <p>Already existing paragraph 1</p> and <p>Already existing paragraph 2</p>. Hence in Example 3, unlike Example 2, since every new data element has a different key than those of the existing elements, it is a part of the enter selection and gets added to the document when .enter.append("p").text(function(d){return d;}) methods are called.

In a follow up to this post I will explain how selecAll().data().exit() sequence works. I have yet to write that. When I do, I will put a link here.

30 comments:

  1. Hi, I just want to say Thank you!
    All your post about D3 helped me a lot.

    ReplyDelete
  2. thanks very much.
    more clear now

    ReplyDelete
  3. Hi, thanks for the article. I have a question: what does .enter()? I mean why we have to use it always even when it's empty?

    ReplyDelete
  4. Thanks a lot!!! I have been debugging a problem about exit for hours and your post totally saved me from despair! I didn't realize that the default key function is the array position, and after I supplied my own key function, it worked like charm!! Thanks a lot!

    ReplyDelete
  5. Great post and superb insight! Shouldn't this be added to the D3 documentation list ?

    ReplyDelete
  6. I am glad people are finding this helpful. Thanks for your comments.

    ReplyDelete
    Replies
    1. Really helpful, thanks! D3 looks awesome, but it's not obvious to the newcomer, like you say.

      Delete
  7. This was very helpful!
    Thank you very much for posting this.

    ReplyDelete
  8. This is really great! Thanks

    ReplyDelete
  9. Extremely helpful article. Especially the part about indices being interpreted as keys implicitly.
    Thanks,

    ReplyDelete
  10. Thanks a ton. Bookmarking this post for future reference too.

    ReplyDelete
  11. What a great post this is !! It has cleared lots of my concepts regarding d3.js.
    Thank you so much.
    And keep posting.

    ReplyDelete
  12. Very helpfull article expecially the part about the identity function parameter for the data method. :)

    ReplyDelete
  13. Thank you, this was very useful, but one question. In example #3 what would the value index 0 in pdata need to be set in order to have the same key as one of the already existing p elements and thus not be appended? It is still appended if I change it from 10 to "Already existing paragraph 1" or 0. In other words, what are the key values for the already existing p elements?

    ReplyDelete
  14. Where is the follow up post? This was really helpful in understanding, would like to go through the next one too!

    ReplyDelete
  15. Great article - but I have the same quesiton as Johnny D. I tried setting
    pdata = [0,1,6,8,15] - and still get 5 elements appended.

    ReplyDelete
  16. You made things crystal clear
    Thank you

    ReplyDelete
  17. Thank you! This was very enlightening.

    ReplyDelete
  18. Wonderful. Thanks

    ReplyDelete
  19. Clear and concise, thanks! Third example really cleared things up.

    ReplyDelete
  20. This is so helpful! Thank you SO VERY much!

    ReplyDelete
  21. Thanks a LoT !!
    it's clear and helpful !!

    ReplyDelete
  22. This is an amazing post. Thank you.

    ReplyDelete
  23. This is a such a well written and informative article, thank you!

    ReplyDelete