COL8: Non-Sequential Indexing in Associative Arrays

hi i’m steven feuerstein and i write practically perfect pl/sql greetings and welcome to the pl/sql channel a series of video trainings on the oracle pl/sql language my name is stephen feuerstein and i’m a PL sequel developer just like you this lesson is part of my series on programming with collections the array elects structures of PL sequel which I would consider to be one of the most important foundational features of the modern pl/sql language you can’t take advantage of most of the great stuff in pl/sql unless you know about and can work with collections the focus of this lesson is non sequential indexing and associative arrays are relatively advanced topic but one that can give you lots of incredible benefits as you write your applications let’s start with a quick review of associative arrays an associative array is one of the three types of collections in pl/sql an associative arrays nested tables and v raise an associative arrays or the type of associative array that you use to define and work with collections of type associative array follow this form you provide the name of the type is table of meaning is a collection of a list of elements each element type here is a last name of the employees table it could be a record another collection just about anything you want and then the index by clause and this is specific to associative arrays only associative arrays specify an index clause in this case I’m saying index by integer I can also index by string which is the subject of another lesson string indexing of associative arrays and we’re going to be focusing on using this index inside the collection and what some of the challenges are in doing so and how you can take full advantage of this indexing so list of elements a collection is a list of elements accessible via the index value so if I know the index value of 16 I can go directly to that point that element in my collection indexes can be integers or string which means you can essentially index by anything you want but index values can never be null and most importantly for the topic of non sequential indexing associative arrays can be sparse this means that let’s take a look at this example let’s suppose I have a list of strings they happen to be the names of fruit so Apple is in position or index value 1 pairs in index 9 22 orange index value 100 and apricot 10,000 23 notice I have not filled sequentially 4 1 2 3 4 I can do that but I don’t have to I can populate elements in non-consecutive index values and that’s the focus of this lesson and one of the reasons that it’s so useful is that I can use it to easily emulate primary keys and unique indexes and more generally anytime I have a an index value the mechanism by which I want to look up elements in my collection that are not sequentially allocated just a couple of comments on this index by so with associative arrays you specify the indexing and the idea behind that index is the same as an index in a relational table it gives you an ability to very quickly lookup elements in the collection well that having to scan through all the elements so just like with the relational table you want to avoid full table scans the same thing is true in collections you want to use the index to quickly find the element in your list let’s take a look at non sequential indexing and why you might want to use it certainly sometimes you simply want to add items to the end of your list in other words add them sequentially and consecutively and this makes sense certainly if the order in which items were added is significant I need to know that the third item chosen is X is ABC so I can go to location 3 get ABC and that tells me that that was the third thing chosen but that’s not always the case and particularly sometimes even if I even if the order is significant sometimes I need to find a specific element the list in the list not by the index value not by the or which was added but perhaps by a string or a date I’m getting up all my employees in my list I want to find the employee with a certain last name well with sequential indexing I have to scan through the guts to find a match and certainly I might want to find elements of my collection with more than one index value I want to find an employee by employee ID I want to find an employee by a last name and first name I want to find an employee by their social security number but the challenge there is that collections can have just one index it’s not like an Oracle relational table where I can simply create multiple indexes on my collection thereby on my table thereby enabling

faster access those specific rows using different pieces of data to find them and there’s no way you can get around that the collection has just one index let’s take a look at what is involved when you do a sequential allocation and then you have to scan through a collection then we’ll take a look at using the non sequential indexing to avoid this so string tracker version 0 string tracker is a package I used to demonstrate the value of string indexing and collection so you’ll see a lot more about this collection in that lesson but essentially I want to keep track of the list of strings that I’ve used it’s a resource tracker and I want to know is a string and use passing that string returns for a false mark a string is being used add it to my list I’ve got a list of string names indexed by pilis integer to mark a string has being used I added to the list so I simply added sequentially I call count add 1 to it and that gives me the that adds it to the end of the list so I’d build up my list of elements and then if I want to find out if a specific name is in use I pass in that name and then I have to iterate through my entire collection looking for a match get the total number of elements in my collection get the first element to my collection do I have a match not yet so while I haven’t found a match and my index is not at the end see if the next element to my collection matches my value if so stop otherwise go to the next index and then return true or false so this is a very small program but it demonstrates the fact that I have to do all this work to find a talent Ameland by the element value because in this case the sequential indexing doesn’t help and this is the kind of processing that can turn it can make your applications run quite slowly as the collection gets bigger and bigger and bigger so the sequential filling of my collection iterating through my collection looking for a match and that’s the problem that can occur in our code so taking advantage of non sequential indexing so associative arrays can be sparse and certainly any string index collection is not sequentially filled so generally speaking when we talk about the sparseness of associative arrays we’re talking about integer indexed collections and the neat thing about associative arrays is that the value the valid value for an index value is a very wide range of integers essentially about 4.3 billion values now the reason we can I know that and the reason it works that way is that the associative array index by type is indexed by and binary integers I’m going to show you the standard package which is one of the building blocks of the PL sequel language it has within it lots of the base data types of the PL sequel language and much else for example number integer is a number with nothing to the right of the decimal point binary integer is an integer with a specific range of values -2 to the 31st plus 1 to 2 to the 31st minus 1 the index of an associative array has to fall inside this range of about 4.3 billion values well that’s an awful lot of values it’s not unlimited but it’s an awful lot and the main thing to notice here is that often times primary keys in our tables our sequence generated integers that’s pretty common in the Oracle world and usually not always but usually they will fall within this range in other words primary keys not exceeding 2 to the 31st minus 1 now you can take a look at your own tables and see do I have tables that match this criteria if you have a table with 25 billion rows in it well this probably won’t work but for smaller tables still large but smaller tables the primary key could fit within this range but primary keys are often not sequentially allocated of course rows can be deleted and so forth so combining these features the fact that associative arrays can be sparse and that you have a very wide range of integer values to use for your collection you have a relatively powerful meaning fast and simple mechanism for emulating relational table keys and that’s what I want to focus on for the rest of this lesson but first of all let’s take a look at the the process of populating a collection of records sequentially just to reinforce how that works this is a script that I showed you when we were looking at associative arrays in the working with associative arrays lesson what I’ve done is create a collection whose type has the element of whose type has an employee’s record in every single element indexed by the

primary key the integer value well they’re indexed by integer anyway this one collection can be used for sequential filling of a collection and non sequential filling here’s an example of sequential filling get me every row in the employees table assign that record to the employees collection and fill it sequentially so start with count of zero plus one is one count one plus one is two three four five six so this kind of code will sequential e fill the collection from index value one and in this case the last is always equal to the count here’s an example of non sequentially filling my collection for every employee in my table take that record put it in my collection and use the primary key as the index value now it could be that my indexes are sequentially allocated it could be that they’re not it doesn’t matter the point is that I’m using the primary key as my index value which means that if I have the primary key I can go right to this location in my collection and return the record so that’s the difference between sequential allocation and non sequential allocation essentially emulating the primary key let’s seeyou the point of this what’s the point of it what good does it do me so for those tables in which you have a sequence generated integer value or any kind of integer primary key and the value does not exceed two to the 31st minus one sorry that should be minus one then you have the opportunity to emulate that primary key inside your collection let’s take a look so in the emulate primarykey 1 script what I’ve done is create two different packages that provide different ways of looking at the contents or retrieving the contents in the employees table my first package doesn’t basically a traditional lookup every time I ask for the row of information for a given primary key it looks up all the data in that row and returns the single record so if I call this function 25 times in a minute it’ll execute the query 25 times now suppose that for my application the employees table of static suppose for example it’s a materialized view the data doesn’t change when the users are accessing the data now in this case I have an alternative since the data is not changing what I could do is take a snapshot of it load it up in a collection and then access it that way here’s how I might go about doing so here’s my employee look up to I again have a one row function passing the primary key return the record of an information for that primary key but this program this one row function doesn’t look up the data from the database and return it instead it looks up the database it looks at the data from a collection what collection is that well this collection is a table of rows from the table so it’s a collection in which each element has the same structure as the row of a table it’s a collection of records indexed by integer and what I’m going to do is use my initialization section of my package to load up that collection so get me all the rows in the employees table take that record put it in my cache use the primary key as the row number the index value so just like you saw in collection of records this is an example of non sequential filling essentially the emulation of the primary key and then my one real function doesn’t create the database over and over again it simply goes to that location in my cache and returns the record since I’ve indexed by the primary key if I have the primary key I just go to that location directly and retrieve the data so clearly it’s a lot less code than running the query over and over again let’s explore the benefits from a performance standpoint so I’m going to create these two packages let’s make sure they compile without any errors you never know looks good okay emulate primary keys – so the second script runs a test of the performance of these two different approaches I’m going to use the SF timer package which allows me to start up a timer run some code and then show the elapsed time down to the hundredths of a second so I’m going to do 100,000 retrievals of the same row of data the employee data for employee ID to 138 using my associative array cache then I’m going to do the same thing 100,000 lookups using my traditional query every time approach database table lookup then just to get a sense of the raw power of the efficiency of working with collections I’m going to populate a collection with a million rows of data and then I’m going to retrieve all the elements one by one and we’ll just see how long that takes so I run my script should take about for in something seconds on 11.2 anyway and here are the results the results are pretty impressive actually let’s just look at the last run I’ve done this more than once so it took 4 seconds / 4 seconds to do 100,000 database table lookups that’s

pretty quick but check this out the associative array cache completed a hundred thousand lookups in point zero six seconds the same number of lookups the same data returned but because I had the data sitting inside my collection I was able to access it much much faster I’ve been wearing it from the database that’s essentially the power of pga versus SGA caching and notice i was able to do a million associative array lookups a million iterations through the table getting data in point one two seconds so working with collections generally is very fast and if you can find a way to have an alternative index a way of indexing the data in your collection to retrieve it using your primary key for example rather than crane the data with every single iteration with every single retrieval you can get much much better performance so that’s an example of the power of emulating a primary key inside a collection useful only if the table is static during the duration of time that you’re accessing the data and also of course if you have an integer indexed key now let’s take a look at multiple indexes on a collection so going back to the model of relational tables most of these tables do have multiple indexes and in fact sometimes multiple and unique indexes used to optimize query performance what if I need to do the same thing at a collection what if I have a collection filled with all the employee information but I need to access it in different ways access it by name accessed by employee ID and so forth you can only have a single index on it on a PL sequel table on an associative array there’s only one index by clause but what you can do is create other collections that serve as indexes into the original collection so it’s the same concept as saying create index on table for a relational table but in this case you’re going to create other collections and populate those collections that service indexes into that original collection let’s take a look now in this case what I’ve got is a table called books and I basically can track of all the books that I’m planning on reading my summer reading list now a book is indexed by the primary key the book ID it also has a unique index on the ISBN number which is a unique string actually that identifies every book published in the world and also for my purposes and author title is unique so I have two unique indexes I have one primary key and I want to be able to look up information by any of these approaches and I want to do it in a PL sequel array because I want to look up this information super fast so I’ve created three functions to look up my data look up a book by its primary key look up a book by its ISBN number look up a book by its author title combination so in this case what I’m going to do is create three collections my first collection is a table of records all the books in my table are loaded up into my collection indexed by the primary key this is essentially the same thing you just saw with my employee ID my employee primary key emulation right here so load up each employee index by primary key so that’s my first collection of records my second collection of records is the ISBN indexing and in this case what I’m going to do is create a table of our collection of book IDs primary Keys indexed by the ISBN it’s a string index and my third collection type is a collection of book IDs primary Keys indexed by the author title combination now let’s take a look at my load process my load arrays now what you saw in the previous example is loading up a single collection of employee IDs indexed by the primary key what I’m going to do here is for every row in my books table I’m gonna take that record put it in the books list using the primary key so that’s the same thing you just saw that’s emulating the primary key now let’s take a look at emulating the unique indexes for the bias is bien and by author title my two other collections I’m gonna take just the book ID the primary key and I’m gonna store it in my by ISBN collection using the ISBN as my index value same thing here book ID and the index value is going to be the author title combination which is constructed by a function that always puts a delimiter between the author and the title to guarantee uniqueness so this collection has every full record in it these two collections have just the primary key now let’s take a look at my lookup programs here’s my one book function get passed in the primary key return the book record and as you saw before with employees I can pass in the book ID go

to that location in my array and return the record boom very quick much quicker than querying the data from the database great but what if I have the ISBN number and not the primary key in this case and we’ll skip this reload stuff you can take a look at all the additional nuances of this package when you’d like I pass in the ISBN I want to go to the and return the entire book record but the ISBN list doesn’t have the entire record it has the book ID so first I get the book ID by going to that location in my ISBN collection returned the primary key then I use the primary key to look up the full record and of course I could just take this and put it right here so look up the index value use the index value to look up the primary key use the primary key to return the full record same thing with author title go to the by author ID collection look up the book ID the primary key by the author title combination and then use that to retrieve the full record so hopefully you can see that what I’ve done is create two other collections I’ve created two other collections that emulate the unique indexes on my relational table allowing me to look up information very very quickly either by primary key or by unique index value out of my collections rather than add of the relational table so that’s the general idea and that’s one example of it take a look at the gen a a script if you ever want to do something like this so let’s suppose you get excited about this idea you do want to create multiple indexes in your collection and you want to create them for every unique index in your relational table you don’t have to write all that code yourself the jennae procedure is a program you can run use that will generate such a package for you it essentially and it’s a great example of a code generator it takes all the information out of the data dictionary for your primary keys and your unique indexes and it generates a package somewhere it out here it generates a package that creates the functions to retrieve the data creates the collection types here’s the one row by for each of my unique indexes it will create a package for you that will do everything you saw in my little book manager right here so if you like the idea if you need to be able to access your collections in multiple ways if you want to emulate your primary keys and your unique indexes do not write all this code yourself call the gen AAA program and it will do all the work for you and that’s in the demo zip so key lessons and a look ahead remember that associative arrays are PL sequel only constructs they can only be used in PL sequel blocks but partially because of that they have a lot of flexibility that’s not available to nested tables interview arrays you have an extremely large range of values that you can use both positive and negative you can choose integer or string indexing and collections can be sparse associative arrays can be sparse which means that you can easily emulate primary key and unique indexes construct different ways of getting at the same data in that core collection and get an incredible boost in performance next I’ll talking about taking full advantage of string indexing of associative arrays another relatively advanced topic for associative arrays I hope that you found in this lesson of interest and can apply it in your own code to greatly improve performance of lookups of data in underlying relational tables happy pl/sql coding