Nikita Popov – What changed internally?

so yeah hi everyone my name is Nikita and I’m one of the people who works on the PHP interpreter and today I’d like to show you some of the recent changes that have been implemented in PHP internals and in that regard I’d first like to mention that we have an upcoming new major PHP version so PHP 7 is coming in about two weeks at least well if everything goes as planned but I think chances are actually pretty good that you’re going to see release on this date and I think there is a lot of excitement about PHP seven at least more than new PHP versions usually tend to get and after all this is the first major version in over a decade it has many cool new features personally I’m mostly excited about the typing improvements so scalar types and return types but really the main selling point of PHP 7 are the performance improvements so I here are some benchmarks which have been made by the HSV empty and these benchmarks test the performance of different PHP runtimes against a couple of popular applications so here we have Drupal and MediaWiki and WordPress and in blue is PHP 5 in green as PHP 7 and in yellow is HS VM and what you should be seeing here is that PHP 7 is generally something like two or maybe even three times faster than PHP 5 and this improvement is not just in terms of throughput so requests per second but also in terms of response times and I think this is a pretty significant change so essentially we means yeah you need only half as many servers to run the same application as before and what this talk is about is really explaining where this performance comes comes from because yet not not out of the blue a lot of changes have happened internally to get this kind of result and before I start talking about the particulars of PHP I’d first like to give to give you a general idea of how you optimize low-level applications so I think yeah if you optimized PHP code you will be mostly looking into making things like database queries faster and generally looking at inputs and outputs if you optimize a low-level application it’s actually pretty similar you are not very interested in optimizing computations because CPUs are like very good at adding together integers very fast what you’re interested in is optimizing memory access patterns and from that point of view there is basically if really thinks you want to look at first one was reducing the number of allocations so the allocator is the piece of code which you tell well I need um 50 bytes of memory and it ghost looks into the memory space and says oh yes here is a piece of memory you can use and it turns out this is a very expensive operation so finding a suitable block of memory costs a lot of CPU cycles at least unless you like want to waste a lot of memory and this is so expensive that PHP 5 spends approximately 20% of the CPU time doing nothing else but managing memory and well we’d like to change this change this and use this time to do something more useful so that’s the first aspect the second one is you want to reduce memory usage so people commonly think that optimizing for performance and optimizing for memory usage are like totally separate goals so there is term of the performance memory trade-off where you can improve performance by increasing memory usage and that’s true that can happen but in many cases goalies actually go hand-in-hand and the reason is that memory access has very

high latency so in the time it takes the CPU to do one request the main memory and get the result back it can do like hundreds and hundreds and hundreds of integer operations and for this reason the CPU contains a cache here–he so it has a very small level of 1 data cache which is well it’s small but it has a short access latency and then there is a level 2 cache which is larger but also has higher latency then there is a little free cache the same thing and if your data is not in any of these caches then you will have to bite the bullet and do a main memory access and this is how it looks like so these are really really expensive and for this reason optimizing memory usage also improves performance because if you have less data more of it is going to fit into the caches a kind of related point is that you want to reuse the amount of indirection and indirection is basically the case where you have a pointer and this pointer points to another pointer which again points to a pointer pointing to a pointer pointing to a pointer pointing to something you actually might want to have so you have to go through a lot of intermediate steps to get a data you want and this is I think bad for quite obvious reasons so every time you follow a pointer you are doing a memory access and that’s that can be slow you said you like to have the pointer from here right through there and yeah just so you know and this picture like it’s not a joke this is how php5 works in both places and we’d like to improve on that and this brings us to yeah the actual topic so how do we apply all of this to PHP and if you talk about PHP internals the first thing you always have to mention outset values because this is pretty much the most important data structure we have it represents all values in PHP from boolean integers to strings and objects and will at the score at set value is really nothing else from a combination of a type and a value and here on the right-hand side you can see the in-memory representation so it has a small type tag and some space for a value and a couple of empty fields so yeah you know this is probably not the whole story yet and if you know if we now consider a like more concrete example so here we have very simple code we say it’s simply say a equals 42 then this variable a is going to be represented as a pointer so it’s going to be a pointer to itself value and then it this point it’s at value and in this case it has a value of 42 is an integer but there are also many other value types and they can be split into two categories so first one is simple values these are small and fit directly in here and then there are complex values like strings arrays objects which are too large to fit directly to the value instead you have to use a separate complex data structure and you can only store a pointer to it and if you have complex data structures then the reason additional consideration you should make namely what happens if you do an assignment so here we have extended the code a little bit we simply do an additional assignment B equals a and as you should know in PHP everything uses by value semantics at least by default so every time you do an assignment every time you pass an argument every time you return the value this is by value not by reference and if you take by value semantics like literally it means you have to do a copy every time you perform an assignment or pass an argument you have to copy the value and that’s going to be a problem if your value is like an array will the million elements it’s going to be really expensive to copy this kind of data structure so on gish piece off solves to this by allowing you to share values so this is done by adding a reference count reference count just says how often is this value shared and then this assignment can be implemented very easily by adding another pointer so now there are two variable pointers to

one set value and we avoid all kinds of expensive copies by doing that that’s it for PHP 5 so that’s the the basics of how value management works in PHP 5 in PHP 7 the the idea of sharing values stays but we want to add one additional consideration to it and that’s well sharing an array makes a lot of sense because arrays are like huge structures and copying them is very expensive but sharing an integer so integer are very small and it’s much cheaper to simply copy an interview then try to share it through this actually very complex mechanism so what we do is say we move the reference count from the set value into the complex data structures and say these can now be shared directly and by doing this we make the subtle structure itself much simpler so now it’s really only a type and value no more reference counting and because it’s no longer a reference counter this part is no longer shared so each variable gets its own set value and end result is something that looks like this so you have variables each variable has its set value a separate set value but they can have shared complex data structures and yeah that’s really the the main change in pH P 7 so why is this good thing so here is a side-by-side comparison I’ve split it up into two parts first for simple types like integers boolean’s and so on and for complex lives for the simple types mmm the end result is a very compact so in PHP 7 you simply have this sub value slot just to type in the value and nothing else so you really you can’t make it much more simple than that in PHP 5 you had to do more things so first of all that that value here had to be allocated which is slow it also had to be freed again which is also slow and you have the arrow and there so that’s one level of indirection and of course it also has all this additional information for managing memory so we have the reference count but also a garbage collection buffer so this GC buffer is used for collecting recycle references not very important for the dust but that we have to store it and now we no longer have to do this at least for the simple types for the complex types we can’t make it quite that optimal because the complex data structure and still stay is basically the same you can’t really change anything about that but still we remove one intermediary level so we have dropped this container in the middle and this means we go from two allocations down to one from two levels of indirection down to one and the overall size of these structures also reduces yeah so that’s like general values in general now I’d like to talk about some of the specific types as well so I’ll start with strings because strings are the simplest of the complex types and in PHP five string a string set value basically contains two bits of information first one is the length and second is a pointer to the actual characters of the string in PHP seven the some changes a bit so we have now replaced so we know use a pointer to a custom string structure previously this year is a normal c string and this here is a custom structure we have implemented especially for PHP it still contains the character data at the bottom it also has has length and it additionally stores the hash cash so on well PHP uses hash tables a lot I will get to that in a moment and we want to really we want to avoid recomputing this hash value all the time so we just store it in here and of course we have this reference count header which is now present in all complex data structures

and the nice thing about this is that we can share strings independently or set values for example it’s now possible to share a string between a value and an array key previously that wasn’t possible we always had to copy a copy the key when you like add an element of an array now we can simply reuse the same structure and well yeah that also brings us to erase so everything PHP are implemented as hash tables and like a short info on what that means so the basic idea is that you can index memory only using integers so if you want to use a string key you first have to convert it into an integer and this is what the hash function does and then well here we have a sample key X Y Z it happens to hash to index 1 for whatever reason so we insert the bucket at this position position so this bucket contains the array key and also contains some kind of data so the array value so the data will be again a pointer to a set value if we insert another key then we’ll add another bucket with different G so here it’s foo however the interesting case is what happens if you have a collision and why do you have collisions well the space of all possible strings is essentially in finite and the output space of the hash function is very small so in this particular example we have only four possible output values so at some point you’re bound to have to have a collision so we’re two strings hash is the same index for example here we have key bar also hashes to index free and now we kind of need two buckets at one position and we solve this by replacing the individual buckets by a linked list of possible buckets and if you now like doing array look-up on this structure for key bar what you do is you find the right index go to the first bucket check the key you see it’s not the right key so you follow to the next one check the key again and then you have your resolved that’s like basic hash table structure in PHP there is one additional requirement namely that ERISA audit so in I know – if you iterate over an over a dictionary you’ll get the keys back and essentially a random order if you iterate over an array in PHP you will get the keys back and the exact same order and which to them and all to support this feature php5 stores an additional W linked list which manages this order and yeah so you can simply walk this list from head to tail to do a forward iteration or from tail tail to head to do a backward situation and this is the main thing that PHP seven tries to improve so we want to avoid having to solve this doubly linked list which means we have two pointers we need to store two additional pointers for every bucket what we do instead is well basically just look at this image and see that these buckets are already in the right order so if we can like put them in this order into memory we can simply use the memory order to determine the iteration order and to make sure the they are actually laid out like this we simply replace the individual buckets by an array of buckets so here we now no longer have one allocation for each element instead we have one large array which contains all elements and we can walk this array from top to bottom to do forward iteration or from bottom to top to do a backwards iteration so this solves the problem of storing the order but also has the additional advantage of not requiring and that extra allocation for each element which is one of our goals and it has also the kind of nice side effect that you can no longer use pointers into the structure so you no longer can no longer have a pointer to a bucket because this array isn’t fixed if you add more elements then it will have to be extended and if you do this extension operation then all pointers that previously and went into it will become invalid so we can no longer do

that and what we can use instead this is simple indexes so we don’t say yeah you have well pointer to this element we simply say that this is element 0 and next one is element 1 which is just a necessity of the new design but has the nice side effect of indexes being only half as largest pointers so we save additional memory on that and yeah so that’s I think the most important change in this area here is a table with some like more particular numbers and the interesting parts are the per element counts because the rest is well pretty much relevant if you have large arrays and the main difference you have here is in PHP 5 you have to do an allocation for every element HB 7 you no longer have to do with and similarly the the size of each element goes down from 80 bytes to 36 so arrays basically get half as large and well yeah that’s still kind of a lie it makes PHP 5 look a lot better than it actually is and the reason is that and we’re doing an unfair comparison here so the numbers for PHP 5 do not include the array values the numbers for PHP 7 do include them and the reason is that it makes a lot of difference in PHP 5 were they whether you have an array where like each element has one shared value or whether you have an array where each element has its own unique value and if you have unique values which is probably the more common case then the numbers change a bit so you will have two allocations per element and 112 bytes so in practice PHP 7 gets a lot better at managing arrays so memory usage goes down a lot so yeah that was a recent general I’d like to highlight one particular optimization namely immutable erase and well there is a sample code here it creates a nested array so we have an auto auto array with 1 million elements and each array element is an inner array with elements 1 2 3 4 5 6 7 8 and if we now do some like basic measurements on this code in PHP 5 is going to take up about one and a half gigabytes of memory so this structure will be huge and it will take over 2 times 2 over 2 seconds to run this code interestingly most of the execution time is spent destroying the arrays not creating them for whatever reason in PHP 7 the numbers go down quite a bit so the whole structure takes up 400 megabytes it’s still pretty large but better and the runtime drops to about 350 milliseconds so that gets a lot better but really these are just some the generic array optimizations what I want to want to highlight here is what happens if you enable up cache in which case the both memory usage and the run times go down by another factor of 10 and what happens here is this immutable array optimization so app cache looks at the code and if it finds any kind of constant array so which occurs literally in the code it will make it immutable and this means it will be stored in shared memory and can be alight freely shared without even doing any kind of reference counting operations and so the difference between these two columns PHP 7 and PHP 7 with up cache is essentially that in the former case we have to copy this inner array all the time and in that case we simply share it so um last type of interest objects I’m I’ll not talk about objects much I have simply like dump the php5 object layout up here but I’m not going to explain the details of it basically the important part is that you have a handle the handle can be used up to use to look up an object’s floor bucket which is responsible for managing the memory of the object it’s really useless structure and it contains the pointer to the actual object and this actual object then has things like information about what class it belongs to and what kind of properties it has and the properties are stored in a separate table each

property has one slot and yeah slots are simply the pointers to the property values and in PHP 7 we kind of cut down on this a lot so this this object store buckets thingy gets dropped entirely and these two values are in line in here and we also combine the object with the properties table so we simply concatenate them so the end result looks like this the the upper half of the object stays pretty similar values are much the same still have a class entry and and yeah plus + 3 and so on and properties have now been directly appended to the object so we no longer have a separate allocation for this but it’s being combined into one and of course as always in case be setting the property values or the set values are embedded and no longer and so you no longer have a pointer to that value but you have the set value directly embedded and if you again do a comparison here I think the significant result is the number of in directions so we go from four down to one previously you had to follow through a bunch of different stretches to get a property value and now you can simply well really you only have to do a single memory access to get a property value Oh also interesting point curious while the size of the object itself goes down the per element size so which is four objects the number of properties the property size goes up which is more kind of weird but the only reason that goes up is because this is once again unfair comparison because left column doesn’t include values right column doesn’t do them but it’s one of the rare situations where you can actually construct artificial PHP code where we end up using more memory in PHP 7 so on the last type I’d like to mention are integers so there is really not much you can change about an integer the only thing you can change this change the size of it and this is what has been done and so what has changed this integer support on the Windows platform so on Linux and Mac OS everything is the same so introduce our 46 bit if you are on a 46 bit operating system but on Windows you always only code is for it to bit numbers it didn’t matter whether your system and your CPU supported 46 bit numbers you could only use for two bits and this has changed now so Windows is now also a first-class platform with the same integer size as everything else but late but well and that was it’s about values and now I’d like to move on to like a completely different topic namely the PHP compiler so the PHP compiler is the piece of code which turns this source code into action to actually an executable opcode so these up codes can then be run by our virtual machine and this happens in two steps the first step is lexical analysis so this step will inspect the source code and group it into basic tokens for example it detects things like well dollar a is a variable end before t2 is a number and echo is reserved keyword but it doesn’t understand any of the semantics of the language so it doesn’t know that something like a equals 42 is an assignment it only detects the individual parts of it and yeah so that’s the first step and the second step which combines parsing and compilation directly turns this stream of tokens into the final and instructions which are runnable in our virtual machine so this level then includes things like an assignment instruction or additional instruction so

that’s the PHP 5 video of things in PHP 7 we have introduced an additional intermediary step so this intermediary step is the abstract syntax tree and this tree basically captures the semantics of the language so it knows what an assignment is knows what an echo operation is without actually going down to the specific details of our virtual machine so it doesn’t care about instruction scheduling and allocating temporary variables and whatever and the reason why we now have this intermediary structure are for once actually not performance reasons so it can be used to generate better instructions but the main idea behind this is improving maintenance for the compiler and not constraining the kind of language features we can implement by having a shitty compiler pipeline from like the Stone Age and well here I have added a couple of annotations how you can get at these these intermediate outputs so the tokens are directly exposed by the token get all function which should be part of every same PHP build the syntax tree isn’t yet directly available you have to use an extension for that for example this one and the up codes can be dumped using PHP debug – P so that’s also directly available in PHP 7 when PHP 5 you had to install a separate vlg extension for that and well after compilation is done we have to actually run these instructions that we have generated and that’s going to be the next topic so the virtual machine first thing I’d like to talk about a stack management so here is a like very simple piece of sample code it just defines the function foo takes two arguments ask them together and returns the result and then we call this function with two arguments one and two and this is the code that will be generated for these operations so at the top you have um the instructions for the call and the bottom you have the instructions for the function body and now I’d like to walk through what happens at run time when this code gets run gets executed so here is the virtual machine stack which is essentially I’m just the place where all the current execution status managed and the first thing that we do is execute these same value instructions which will which will push the arguments of the function onto the argument stack then next operation is the do F call do have called traits call frame for the for the next function call and this call frame has three parts so here in the middle the execute data is like some general function about general information about the currently executed function then before it here is space for temporary variables so till the zero was temporary and at the bottom we also have a couple of slots for like real PHP variables like mb and if we now run the function the first thing that happens is that we receive the arguments and what this does is copy the arguments from the argument stack into the corresponding variable slots so we have now duplicated this information and finally we can perform the actual addition which will write the result so free until our temporary if we then return of course all of this gets dropped from the stack again and we’re back to back to the beginning so yeah this is PHP 5 what you should mainly see here is that the arguments are duplicated so we have them here on the argument stack and then down here again on the variable slots and we want to avoid that we don’t want to duplicate information so in case B 7 and bytecode looks very similar only one small change namely we never start with an init F call which pushes the call frame right at the start of the call it still has the same structure only the order changed a little bit so temporaries and at the very end and if we now send the arguments these arguments will be directly written into the correct positions for their variables and this means that when we actually call into the function we can skip these receive operations because the arguments are

already where they are supposed to be we don’t have to copy them again and we’ll finally the additional instruction and gets the result so here we have avoided duplicating the argument this is like a nice optimization but it’s not completely transparent so this will this does change the way that HP behaves mainly like if you get a debug back trace of your function and you have modified the your the variables which correspond to your arguments then the debug back face will make it look like you actually called the function with totally different arguments so that’s one of the things you may want to look out for if you debug hb7 code so don’t trust arguments and actresses they can be wrong so that’s one virtual machine optimization another one are improvements to the instructions we support so there are many of these I have picked out the particular example namely inlining internal functions so there are like a couple of commonly used to general functions like stolen or type checking so is Boo is int and called user phone call user func array these now support have so we have now added instructions which perform these operations directly to the virtual machine and we no longer have to generate function calls for them so instead of generating a call to the stolen function we simply have a mr land instruction in the virtual machine and what kind of sounds nice but there is one problem in practice this only works in mobile scope so if your code is not named spaced and the reason is that if you are in the namespace like here we don’t know whether this sterlin function is really stolen or if it’s overridden function in namespace and so we can’t apply the optimization if you want to apply the optimization you have to use a fully qualify thing and and this is how you see what kind of applications we use to measure the results of optimizations and we use applications like WordPress which are like kind of not the most modern type of code and for WordPress these optimizations work really great but if you have like your Symphony code then they won’t apply at all so yeah I find that a bit sad that our baseline is old code but anyway so last which some machine optimization I’d like to mention our global registers so there are two pieces of information which we use all over the place in the interpreter the first one is the execute data which is the call frame the second one is the up line which is the currently run instruction and we can now do a really simple change here you no longer have to reload these value from memory so we don’t have to access the executor Global’s to get at the cold frame it’s simply always in that register and it also means that we don’t have to save and restore these values whenever we do a function call at the sea level so this is a like very simple change so in theory just two lines well it has a pretty significant impact on some platforms so on x86 there are and not many registers and so this is pretty expensive because two registers will no longer be usable for any kind of computation but on other platforms like PowerPC and this two line change if something like a 30% execution time improvement okay alligators alligators are very interesting I’m going to skip all of them and talk about op cash instead yes so there are two new features and up cash that I’d like to highlight the first one is a new file based caching mechanism so normally up cash uses shared memory and now you can also additionally or alternatively use a file by file based caching mechanism and it’s probably not useful for the kinds

of applications that you develop but this is something that might be beneficial for shared hosting we’re using shared memory is problematic for security concerns so if you lose something like su PHP then they can’t use chart memory but it also might be useful to avoid like really catastrophic response times after a PHP restart or after a cache reset or maybe for deployment but well not quite we yet what this is going to be used for but one of the main ideas that might improve performance for shared hosting and here I did some very crude benchmarks just running sequential WordPress requests and the result is that the file cache is 2 times slower than shared memory but it’s still 4 times faster than having no caching at all so if you can’t use shared memory this is going to be a very large change that’s one of the features the second one and also the last thing I am going to talk about now our support for huge pages and what this means is that up cache provides a new resetting and if you enable it and the code segment the text segment of the PHP binary gets remapped into huge pages in order to reduce misses in the TLB cache and what this means is so here is a like picture from the start of the presentation the CPU cache here he this cache here he doesn’t apply only to data it also applies to instructions so CPU instructions now not cage B instructions because in the end instructions are also just a different kind of data so these are all these also go through the CPU caches but that’s not the whole story yet because applications always use virtual memory addresses but the CPU caches use physical addresses at least on 50 CPUs so before we can actually do a lookup in one of these caches we first have to translate from virtual to physical memory and this is done through page tables these page tables once again live in memory so accessing them directly all the time would be like really slow and for this reason there is a additional caching mechanism which we call the translation lookaside buffer the TLB which is just a very specialized cache only for these page tables the problem is that for historic reasons thinking software everything is for historic reasons these page table entries all have so each page has four kilobyte size and the PHP binary is something like I don’t know maybe 20 or fergie megabytes in size so if you try to map the whole PHP binary using 4 kilobyte pages you’re going to need something like 5,000 pages or more and five thousand pages definitely don’t fit the TLB cache they don’t even fit into the second-level TLB cache by far so the solution to this problem is that you can use huge pages these huge pages have if they’re 2 megabyte or 4 megabyte size and if you have well 2 megabyte pages you can map the whole PHP binary using something like 10 page entries and 10 page entries that’s something you can’t fit into the TLB so what this feature avoids are not cache misses into this normal caching pipeline but this this special is still be caching pipeline which in the end really has the same effect so we avoid latency due to main memory accidents but well this feature not enabled by default because not all operating systems supported so you sometimes have to install special kernel modules a set historic reasons and especially at the CPU level these things go away even more slowly than they do in something like PHP and you all know how much legacy PHP has well anyway that’s it from me so here’s contact information and jointing so I’d really appreciate to have some feedback and well yeah do we have any questions you have time for questions

you know those on those first lights there was some benchmarks yeah and I would like to know if the the calls were done in the same way in PHP 5 or and PHP 7 or it was like using best practices so it was exactly the same the same call and also if there was input and output because since I know in PHP 7 is support for a synchronicity for a synchronous cause so then also that would cause they call to be faster in so first of all PHP 7 doesn’t have support for asynchronous calls ok so just to say that right away it has improved supports for core routines which can be used for a synchronous code but there is nothing natively in it and well these performance benchmarks are used the very same code for PHP 5 and PHP 7 and also the same configuration so though no code changes no no optimization from that point of view the question hi in one slide if you’re sure that op caster abuses memory usage ten times when large arrays are used that this works in p35 as well or only in Petrus I know this is a new thing and PHP 7 ok thank you I mix the I’m using exact for 4 lines you know sometimes now it’s not supported anymore I understand oh it’s already in the core how can we the back because I used to use X the back extension for the right so now what are the options no the bugging is still the same I think X the back even supports PHP 7 like experimentally what I was getting to it that Jesus due to these call stack optimizations the things that extra back reports might not actually match how you did the calls so if you have a call stack and you see you are calling this function with arguments a and B it might actually be that those weren’t the arguments you pass to the function so it might be you passed different arguments of the function huh it’s fine so how do you avoid it tricks oh great so it’s it’s not even the problem I think she worked fine I think all the tests work so if you can find me an example and happy to have a look at it okay I think so there are XS that she has done magic in XD back and avoids this problem so maybe you don’t have to worry after all the problems to exist if you’re using something like funky darks to implement the realloc functions but what’s really escape hi the question about immutable arrays will it resolved for example also a bit more complex values instead of like integers and strings for example if we put an array with values from constants or class constants will it also be treated as a imitable array in a sensing of cash so it depends if the constant is defined in the same class so if you’re using something that itself constant and it’s in the same class then the value will get in mind and we run constant expression evaluation and in the end we end up with the like literal array which we can make immutable but if the constant is defined in a different file in a different class then this isn’t possible and yeah it goes back to how it was before hi it’s about your side when you advise

to use the slash to enjoy is optimized ob code for some I’m part of Begbie CMS team it’s be PHP CMS project and we use a lot of this function and I just want to know if you have some benchmark to see if we have a really advantage to just update following your advice or if it’s just a monistic so just to clarify you I’m asking if it’s worth writing this backslash a is there a big advantage to use this slash or not and so I hope the takeaway here is not that you should start prefixing all your internal function calls with the backslash so that’s a micro optimization and as usual we don’t recommend doing that but if you have like really hot code it might of course be worth doing it but I don’t know the exact numbers how much I improve things because maybe we use thousands of these function in if you have some kind of commonly used string manipulation codes it might be worthwhile but really if profiling shows that it’s the hot function you can do it but otherwise not thank you any more questions okay I think I think that’s it