Scaling Stupidity : How to Get Ahead on the Testing and Innovation Curve by Craig Sullivan

so good morning everyone good morning what’s my talk about today well it’s a little bit about a bridge between the seemingly disparate worlds of UX analytics any be testing for me they’re not distant cousins they all go together analyze dance they met the film love and they lived happily ever after at least in my mind so talk a little bit about that but mainly a lot about the humongous amount of breathing bloody stupid testing before deal on their websites we sting endless thousands of eras on this stuff so that’s a good way of illustrating this well the Gartner hype cycle classic example so where does what can hands with EB testing if it’s put into this kind of hype cycle well the first thing happens is someone goes your image array be toasting tall right and they get some stalled right okay this is where the problem gets worse because then people start to do stupid testing let’s Testament beeps let’s try that random testing and then they see really love eros stupid testing it’s like nero burning rubber do nuts in the car park of user experience lots of light lots of smoke lots of sound but nobody’s going anywhere this is a problem with this stuff then see let’s do move the stupid stuff let’s hire more people get a bigger team so the whole thing scales up so you just got massively skilled stupidity and at some point you will reach in a company’s evolution the peak stupid moment okay you get stuck there I know some people are still waiting there for something to happen but often williams’s people begin to question things so they’ll see you know we’re not seeing the our awareness you promised all these lifts five hundred percent two hundred percent weird the hell are they I don’t see them in my pocket so it’s happened to them well the statistics get debunked somebody actually works out the company is not test properly there’s a faith crisis this will be testing stuff is all will be sting our time and then you’re in the trough of testing we really deserve one way of calling out of this hall or getting past this part of the testing cycle and that’s the figure out we are you’re going to test how and the most important reason is not taste because you think you should change something there’s got to be why because we saw users struggling with the fact that they can understand our delivery charges we thought that by showing these will people will buy stuff that’s of why children an orange button to a green button just to see is not apply any be testing so that’s the first thing the second thing is a little better data science statistics samples understanding purchase cycles these are important I’m the most important is a/b testing is not about lift or money or anything else like that or a conversion rate it’s about learning useful that you can t in use in the rest of your product understanding things from running tests that’s a valuable piece of information to your business that’s truly be tasting once you master those kind of things you can get into the whole thing of tasting to envy and I’ll talk about that later on but once during this part of the diagram you’re burning huge amounts of cash all this stuff you’re spending on testing it’s not actually doing anything because it’s not teaching you anything useful this is what happens right this is what it feels like we were actually you applying the dead in your own place you know lots of flexion but not much action baby be tasting is a bit like this you know we shouldn’t be surprised that these things have gone wrong because YouTube was never going to be one hundred percent quality content was it so you’ve got 100 million funny cat videos you’ve got 35 million videos people video in themselves playing video games was that all about and this has gone up a few netherlands since the last tooth or forgotten last month there are too many of these for me to watch before I die it’s impossible but there’s the problem of irritation itself it’s a land tier of absolute bollocks and some very good stuff at the front right so it’s just a bit like YouTube some really good stuff a small amount of icky stuff and in a long tale of absolute total utter rubbish is that we’ve given weight sibos 82 people said here’s some light sabers and play with them right and then we’re

surprised when this stuff happens right so you can use advanced eb tasting tools and through momentum times the people who are not ready for it will be trippin off limbs effectively in your company so use your ten point checklist I’m going to cover today that will serve this problem of stupid testing first thing is your analytics is broken you or your clients your analytics is broken you know it’s a fact the lr a burst of reconfigure of looked at is booked and horribly and badly but the problem is is that if you think all we just installed google analytics and all kind of works this is what you will get you’ll get a collection system a bit like a retail store but it’s going to be a Mickey Mouse one it’s going to be a toy Google Analytics is free but also seen here I don’t need to put oil in my car do I you know if you do not invest and analytics and put the right amount of set-up time in it you know just like no pitting on your car the other thing they’ll break down and if you do invest and then you end up with an analytics set up this antiquated it’s not varying it’s not iterating it’s not improving so the analytics has to iterate with the product right the two things are coupled making changes and measuring them and over there besides I’ve looked at in the last three years almost and of them have been broken so when people say to me there’s nothing wrong with our stuff I say to them bill we need to go and check so this is like the turns on a retail store if you don’t call it the right stuff at the tails lunch money gift certificates then what happens is this you collect the collection layer now Madge of John Lewis had their own stuff being collected in the toes then of this from was up words in the organization to the collection there were some matrix that take the collection and there is matrix going to reports which go into a nice pastel colored brochure executive dashboard right and the Flair of their stuff is like a sewage pipe coming over a lot in your organization from the collection layer from the tills know if nutriskin grass but you guys need to too so get a health check and figure out where I analytics isn’t working second tip is you’re tasting in the rental place for me it’s like a plumbing problem I’m trying to find out we have the flu of delight and money is temporarily of totally tricked in the product Nick you know I can visualize these floors and sites can almost like this one I’m thinking about it but there’s a million places that can test but there’s only gonna be a couple what I actually walk and this is what normally happens yeah let’s test the homepage yeah I want to taste that page me personally I hate the product page so it was almost Eagle rampant Eagle driven crap and testing it’s also there and design when you didn’t think that you’re going to get rid of art when you start to DB testing seriously but before but this is your wasting time testing in places aren’t going to give you p back earlier on you haven’t prioritized this stuff accession problem is people copy competitors only what they are doing we need to copy that it’s really stupid right if you copy someone else’s site or a/b test you could be copying over stupid things they’re doing you just can’t discriminate the problem is is that there isn’t best practice testing you can’t see our service taste on website that will work on our customers because it customers the marketing ethic the emotion of state they’re all completely different if you think you can just take a taste that you’ve seen a visual layer within any of the detail inside and just replicate it on your site and hope that all works so these are nice patterns but you haven’t proved it yet ok show me the proof you need to do things like this kind of modeling to figure out where to taste this one example here is form from a site that does i TS and hearing right so these guys are large retailer and basically if you look at all the traffic into their site and you don’t know better figure it out nearly seventy percent this people will want to go to the stores I want to see the store well what they all want to get durations I want to get the map or want to call them okay I mean the rest of the site is basically I stuff and hearing stuff so you can then look at these intent groups of people and no matter how many of them go through the phone so you’ve got smarter where you’re not looking at overall conversion rate you’re looking at conversion rate of committed people who are have the potential to enter that funnel it’s a different ball game and if you have a

btb site that maybe has lots of lead gen or a non ecommerce site then you can have a mountain like this as a number of in 10 groups or series of influence pages group together that are shoving traffic into those funnels so do this kind of modeling and do state drug diagrams these are great there are Google Analytics if you switch them on you can see how many people do search get to a category page get to a product page and so on so you create a formal diagram all the way to the edge of your site what just inside the funnel itself this is called a horizontal funnel check emotes enhanced e-commerce and Google Analytics so do your research and analytics before you decide where today don’t go off half-cocked a lot of people think are responsive solves everything you know they can but it really screws up he be tasting in some strange and curious ways because it’s not really a ball one you to think about how you’re actually testing here’s a classic example so desktop site right and it’s got these please stop putting in line liebherr field labels inside forearms right stop putting them in their oats really damn it doesn’t work doesn’t work in usability test it doesn’t work in conversion stop doing it it looks nice but it’s bad for users but in this case if I was wondering maybe taste against this you know I’m thinking that’s the design but that isn’t the design that’s the design it’s two different layouts that break on different devices so if you’re going to be testing and a responsive site you need to know what the design be points would like because otherwise you don’t have a clue about what the eb test or would like even though you have to target will you be test to those devices or that break point I’m going to be test to those devices we have to forgive a stuff oh here’s another example got a nice form like this on mobile button desktop wow it’s like super super wide field to me never seemed you know I’ve got plenty of space to fit my questions and comments in there it’s a general problem the responsive design looks great on mobile shape on everything else right okay yeah we’ll move our forum mobilism way right because you get stuff like this great we’ll put the search box at the top of the screen and spam it’s owned by 1920 Betania monitor it freakin best appears so nobody can see the search box anymore they did hardly an amused it on desktop here and then this kind of stuff on responsible I just had to call her so deserve a remove that these are right como would tell me exactly what these are no nobody else knows either right okay so start doing it the second one some of you shoulda guessed because it’s Microsoft Word daikon but stop putting mystery meats like this on websites look the blue ski is talked about this it’s like chuck it in the hamburger it’s lazy thinking off you’ve got these men you are Chuck a min hamburger right but with people few audience doesn’t help 20-somethings you know what Boca res then they’re not going to move it lets the menu are they they just see some limes right if you add the word menu here’s an a/b test with all the detail you had the what menu people get it they use them in they then buy stuff amazing affordance it’s an important thing right use it and that’s where this stuff is of what to those wrong with different types of kebabs where available on your website where’s that there’s also one with a low grid of sex on main that’s a big toolbox right also great everybody thinks rff these things all our customers use iPhones it’s just got huge amount of devices out there and you need to work out what your device makes those and not really intimately if you’re going to be running eb tests against them it’s really important all this data is available in Google Analytics and there are three reports down here at the bottom the deck is downloadable for grab it later here’s an example on iphone you can go into Google Analytics and it will tell you oh the device models of iPhones so you can work out why your website doesn’t convert well on iphone for us because the button is cut off for the menu doesn’t work so there’s this data there this is really useful data when you’re designing products to go in my net right but figure out what your device mix is really don’t make assumptions or see it’s all iphones because it’s not true you don’t do any research before you actually go off a be testing you’re missing a big trick because this is what drives the quality of tests we ask people why they weren’t doing this they’re seeing time and budget oh god i want to really kind of lame excuses for

not doing that stuff and people think it takes a long time but it’s not it’s really a difficult sort of a probe that you don’t understand if you look at our landing page how you’re going to solve it if you don’t actually have a problem domain to consider as part of it and this is the thing I think this page needs something and people just pull this stuff out their heads need evidence we don’t need crystal ball when it comes to the design of the site we need these things we need to know where they came from what they had in their hands what they did before what they did after with the drop toe all this information again is available in Google Analytics there is a complete plan here for one hour of research you can do on winding page on your site or client before you do text on it whatever so you need and it describes and articulate these kind of things how many tablets drapeau at this level how many more bowels make it through how does that stuff compared to that so there’s no excuse you got the luxury of two hours you can do all rest stuff as well and if you have four hours you can really get into some serious stuff so even with half a day of work there’s a complete article explaining all this stuff you can download all the Google Analytics reports but if you do this stuff before you actually build your test and you’ll be building test based on real reasons observations and data real stuff you’ve collected and it will push out the eagle and opinion so do your research it’s lean analytics right it’s really rapid light we research you can pull data and chuck it into the design meeting very important and if you don’t prioritize your test if you’ve got 100 places you want to test you need to see which ones are going to be worth doing and that’s the stuff that is easy to build but also likely to drive lots of money or an increase in customer delight so if you scored things on an opportunity versus cost then all you need to do is start working this stuff up in the top right-hand quadrant because this is the cheapest stuff that will shift it this metric most and so we start there and once the people just go on randomly pick stuff from here start here and you will make more money earlier in the testing cycle and if you look at all your tests and quantify what to five and ten percent left in your matric would get you you can even poop a money figure here against and see this is how much money we think this test is going to make that’s another way of dealing with the priority but if you don’t have prioritization then you will be doing random testing that’s the problem so prioritize the targets another problem is is that most people launch the Raby tastes and they haven’t tested them on the devices it’s going to be Sean too so it’s broken on a tablet or is broken on internet explorer or it doesn’t work on Android handsets how do you know you’re just going to call you out and tell you it does not happen okay no one’s going to report that they’ve got a broken a/b test in your site so you need to test with this shizzle right I’ve found what lots of money with this stuff this is my testing rig I use this to test all the browsers all their devices or the tablets all the mobiles and the picture behind is actually a rack with mobile phones that you can connect to over the cloud and remote control them so if you don’t have you know on i four and five or four hangin round the office then you can rent one for there and run some tests on it you can go to a foreign in brazil rent it go to the brazilian app store and stole a lot and do everything like you were they aware that in your hand so again there’s no excuse there are even open device lives in London and other cities where you can take your shizzle and test it there because it got loads of stuff there’s a whole article explaining this please read it so do your pre-flight checks this is the biggest problem that has people the ninety-five percent confidence thing for a people are hard recently asked the mall when they would stop a test and listed when they had ninety five percent confidence it wrong okay it’s not going to tell you what happens is a lot of tests get to ninety-five percent confidence before they’re ready it’s a false declaration it’s leading you to believe this test is finished and there’s an article in nature that explains why this best statistics has been debunked you’ve got to be careful about this stuff and here’s the problem you know if I’m running a test on fluffy kittens versus cute puppy dogs on my cliq bead site then you know after 200 tests you know I think the fluffy kitten here number

three you know as significant as a test result but if I’d let it run for as long as I was actually planning to who would see in the end that there’s a fifty percent error rate you know I’ve made a bad business call so I’ve gone rain told everyone gangbusters and fluffy kittens fluffy kitten landing pages fluffy kittens or not fluffy kids everywhere it was the cute puppy dogs that one but because people in beauty be testing with some sort of AA amazing statistical validity that might not actually be there you’ve told your whole company to bit its future on the wrong call it’s like Quinn flipping with your business is really dumb so don’t self stop there’s the guidelines for how long to run your tests for that is a nice little safety net it’s not perfect it’s got some rough edges but use that as a baseline you know that’s the kind of thing this is how long you want to run a test for never run it for a week always run at least a couple of weeks so anyway there’s a test length calculator there that let your work out how long it’s going to run you plug it in you run the test for the time you specified new stop it don’t wait for it to go your way if B does not when letting it run a month longer is not going to magically start winning believe me I’ve tried it I have waited a long time so understand how long to test for because that’s really going to hit things and you need to solve a type officers most of these this is what goes into most product design these days it’s just all and it will fall out vacuum you know if we don’t have user research or data then this is the stuff that tends to fill that vacuum and this is the real stuff that we want to go on place of that and this data and research whether it’s analytics data are you user research will actually drive the quality of the hypothesis engine your company so print this out and anyone wants to run an EB test on your site your client site you see we will run it if you can fit it into this sentence okay and most people will break down with the first one I would like to change an orange button because I just felt like it it’s not going to work so I use this to deflate stupid testing they lessen and if I’m not laughing by the end of it then you might be in with a chance because we saw an angry email from the CEO we expect the change in button colors will cause the office to call them for addie right we’ll measure this using some metric that we pluck out here that’s what normally happens so use the hypothesis kit and get that properly going this is a big problem running random tests and running tests and wrong places it’s not actually teaching you anything changing the color of a button doesn’t teach you in it and it might teach you that the prominence of the button height please the factor and the conversion actually you’re really interested lots of tests that tell you that’s really interesting that one that tells us something that we can use in our PPC advertising that tells us something we can use over there so the testing isn’t about lifts it’s actually about learning that you get and this is a great quote this is what it is for me it’s just extra information I’m running tests to teach me things that I can replay they go into my playbook so try and design taste for maximum learning you’ve got to think what will happen in terms of the outcome there before you actually run the test and if you don’t know what action you’re going to take based on ear or be winning then you haven’t really thought it through have you number ten this is the most important thing if you think that you can go to a silo wised organization that passes the product around like a screaming baby or holes it around like a screaming baby no working in different silos and you don’t talk to each other you think you’re going to take eb testing and layer over the top of that kind of and expect it to work it will not so not on agile not an iterative design it just doesn’t work with a be testing I’m seeing a big theme develop your guy lations are getting fed up of wasting forty fifty sixty percent of the development team have an innumerable business analyst and project managers you don’t seem to glue anything together they just gum things up or stretch them out for longer too many meetings too unwieldy kind of systems for coordinating the stuff you know and the endless suck of sign off you know three weeks design there’s one piece of text office just so if you have that in place and you’re a be test and will suffer from exactly the same problems it’s the same transport underneath Financial Times example smart team six to fifteen people they can set and get analytics matrix they have autonomy to publish and they published several hundred times a

day the stakeholder teams have no idea what these guys are actually going to build they just say we want you to increase subscriptions by fifteen percent and we’ll leave it to you to do it they can’t even come to the meetings where the team are discussing all this stuff so the business just defines an outcome the team has to execute a number of strategies around I outcome but what happened before was every project at the FT took at least 18 months it was our day nobody wanted it at the end and it costs much more and took far longer than they expected and after they did this stuff nothing was late it was all under budget it was all launch day earlier than it was expected and they all got really good feedback from customers so go figure you know ft ing what companies have been talking to recently are getting this finally I hope everybody does and you know they’re launching alphas and betas and pilots and increasing the fidelity level of their systems so they’re prototyping live on their website not prototyping in the lab the prototyping live against millions of customers and using the live traffic to actually iterate this stuff in a quantitatively sound way that will be better than iterating it in a lab you know from some points of view we can about that one afterwards the answer is do both so positive attributes what I’m seeing EB test and really working are these kind of things if these are there the more of these legs a company has the more soul that is a bar stool if you have one of two legs you’re going to keep falling on your ass right three four or five it’s just even more solid and all this stuff I call this it’s all the same diagram right because you know I can people see it to me while we’re going for a data-driven iterative yu xing whoa isn’t that kind of just like lean testing or isn’t that a bit like analytics with the better you know qualitative data in there yeah it’s all the same stuff so I see people who see all we’re getting really good and I look at them and I think actually you guys are lighten up always stuff on here so although we approach it from different disciplines a lot of this stuff is essentially the same and we’re trying to solve the same problems so don’t think of these differently they end up in the same place when you get that advanced so bun down the silos and really when it comes to the next part of a/b testing which is innovation if you get those of basic ground rules in then you can start innovating around the interesting things that are happening at the edge of your product product futures and I like to think of this is you know I call this my creek diagram right and these are all the dendritic outcomes that your business can potentially have beast on the choices that you me how can I make sure that my team aren’t going down these creeks at the bottom how can I make sure that we choose some positive outcomes all it’s a bad future decisions in here and some good ones how don’t work out what the best path is for my company to take and it’s called Rumsfeld in space you need to explore it and get out there right is the unknown unknowns so it’s like what if we changed our product what if we put the price down what if we had five packages rather than three eb testing can answer these questions you can actually run some tests we’ll figure out what your business future is going to be like McDonald’s is a great example they’ve got a bar graph store where they test new recipes is not cold McDonald’s looks like some cool hipster burger store right but they pre test menus in there and then they decide where they’re going to roll em out nationally your website is your hipster test store so use that to transform the business it’s your pilot here’s a perfect example from the student room of this sort of tasting these guys who are offering too much in the Free Package look at all these tix here given away too much being far too generous how can we adjust this to give the maximum amount of revenue here but also get loads of people to sign up for free why don’t we take the text away and the developer said that’ll take months to code it’s really complicated and I said we just do a wizard of oz test there’s nothing behind the cotton wilshire people different numbers of X’s and measure their propensity to sign up for the free package or the pay package so we were able to model the future of the business before we did the cording to support that future model and the answer here in this case was by removing a couple of the exes we increased the paid signups by one hundred and eighty-five

percent it’s an annual subscription this company doubled their cash flow after running this test that’s not a bad day be test to run but they were able to predict the best future for their business before writing a single line of code very important and that’s what we said at the end to the people who we said we’re going to get some features we just said congratulations were given you some free stuff that you weren’t expecting they were delighted it was all a lie but it was meant time sort of fundamental business question so if this is a kind of thing that’s happening on eb tasting then you need to work out how you’re going to solve the eagle problem and that’s what i thought i knew back when i started testing and this is actually what I no no I know the true extent and boundary my knowledge and I know what I can predict and what I can predict and EB testing has taught me that it destroyed my ego and essentially a lot of you are winging it is easy to see problems in a usability test it’s easy to observe things going wrong don’t fall into the arrogant trap of thinking because you can spot them that you know precisely the solution that will solve that problem that’s where testing and validation comes in you have to admit what kind of winging a little bit right just be truthful with yourself and part of that is the first step in going to guess a holic synonymous because that is the first step in getting it you know you gotta admit you’ve got a problem you’re spending a lot of your time guessing so do it in a data-driven way remember together the stupid testing curve and if you have any questions or you want to catch me afterwards please please drop me a line you can download the slide date here thank you very much for your time today