Subtitles section Play video Print subtitles Hi, everyone. I am really excited to be up here talk to you about browser extensions. We are going to talk about what they can do, challenges my team faced when we built ours, and most importantly what all this has to do with sandwiches. My name is Shannon Capper and I am a front-end developer here in Seattle as a starting company called Textio. I should tell you a little bit about what I do. This is Textio. We are an augmented writing platform that lets you know how your language is going to perform as you write it. We document a score, we use highlights to draw your attention to key phrases and suggest replacements for troublesome language. We are writing a recruiting aid here and as you can imagine this isn't the ideal user experience to write e-mail. It is disruptive to come over to our website to write the e-mail and copy and paste it to send it. Ideally we would want to bring the platform to you. We wanted to build something that looks like this. This is someone else's page and someone else's e-mail editor but with all the text in U OOSHGS that our users know and love -- UI. What actually is a browser extension? Out of curiosity, how many of you use browser extensions on a regular basis? Basically everyone. Now how many of you know how to actually build one? That is actually a pretty good number. You may be smarter than me but I use browser extensions all the time and had no idea how they worked until I started building one. It turns out they are pretty straightforward. A browser extension or a plugin is just a small piece of software that runs inside the browser. My extensions are built with JavaScript, HTML and CSS. There is a standardized web extension backed by the W3C. Safari and Internet Explorer use different extensions so it is difficult with them. The question you should be asking here is what can't a browser extension do because these things are crazy powerful. Frankly, it is kind of terrifying knowing all things we are capable of. For example, you can get lots of controls over your tabs. It can open new ones, read the ones open currently, and close them. Pop-up blocks for a good example of this. They monitor open tabs and monitor those that detect problems. And we can make network requests. Mail checker will ping the server regularly to see if you have new mail and if it detects new ones it shows you how many. We can read, modify and add to the DOM of any web page and this could be used in a creative way. This is a video of Google's translate which reads the selected text from the DOM and adds a tool set to the page to show you the translation. Another example is my personal favorite called millennials to snake people and it is awesome. What it does is finds any instance of the word millennials on the page and replaces it with the word snake people to write results like this. Here the plugin is actively modifying text nodes put into the DOM by the house page and that could be dangerous but we will get to that later. To go back to the earlier question, if browser extensions can do all those amazing and powerful things then what can't they do? The list is pretty small. First, they are tied to the browser so if the browser isn't open, the extension can't run. Pretty obvious but worth calling out. Next, you can't go changing your permissions willy-nilly. You have to declare which pages you want to run on and which parts of the web extension API you want to access. If you want to go change those permissions later on, the extension is temporarily disabled until the user agrees to the user's new one. Probably the most important thing it can't do is directly interact with the host page's JavaScript. They don't even share the same Window object. While this is sometimes a problem, for the most part, it is a good thing. At CascadiaJS two years ago, it was called out the dealing with the polluted objects. The host page might do bad monkey patching and the good news is we don't have to deal with any of that because we are in an isolated environment. Under the hood, a browser extension is made up of four parts. A manifest file containing metadata and specifies permission, there is a pop-up which can be invoked by clicking on the extensions icon in the browser's toolbar, there is a background script running behind the scenes whenever the browser is open and isn't tied to any one tab and has access to the full web extension API, then there is the context scripts and they can talk back and forth with the background script. If you are used to seeing server-client architecture diagrams you will notice this is looking similar. These are the ones that can interact with the DOM and what we will focus on for the rest of the talk. Let's get back to our story. My team decided we wanted to build a com extension. We decided to start with Chrome since that is what most users use and we knew we could extend to other browsers. We wanted to bring Textio to e-mail and supported clients. Namely LinkedIn recruiters and Gmail. Now that we know the powerful things an extension can do, it should be straightforward to make this picture a reality, right? No, it wasn't straightforward at all. Turns out writing JavaScript that runs on someone else's page is way harder than writing JavaScript that runs on your page. As a browser extension, you have no control over the page. You can't control how its laid out, you can't control the code or the user action and every single page is different. Not only that, but it is also super easy to abuse your power and change things in a way that brick -- breaking news -- breaks the host site. We laid down a set of principles and principle one is don't write site specific conditionals and number two is don't break the host page. Let's start with the first one. One problem writing an ex tension that works across multiple hosts is each one is different. What if you need to tweak the layout based on the site? What if you need the selectors to be different on different pages? In the early days of the extension, especially if you start out with a small subset of sites, it can be tempting to let site-specific conditionals creep in the database. If I am on Gmail do A and if I am on LinkedIn do B. This is the path to madness. It will be an absolute nightmare to contain. Every time you want to support a new site, you have to track down every place scattered in the codebase where you have site-specific logic and that will be no fun. You might try building solutions in a generic way so they work on every page. The key thing here is code reviews. The more code that is reusable against all sites, the better. Where that isn't possible you can try limiting site specific code to a single configuration site. One config per site and all the rest of the code can generically consume the site config without worry about the host site. If you want to add support for a new site all you have to do is build a single new site config and you will be good to go. Let's see this in action, shall we? I am going to exit out of here and let's mirror our screen. OK. Here we are in Gmail. I don't know about you guys but there is something I really miss about writing on my computer. Something that feels missing. You know what I miss? I miss clippy. I want to build an extension that brings Clippy back. We have a content script running in Gmail and let's see if we can get him into Yahoo. Let's look at some code. Here we are on in our code editor. This is the content script running on Gmail. It is really simple and just has three parts. First we will get the site config, next pull from mail Windows and if we find a new mail Window, we will append to clippy. Let's break each one down. We only support Gmail right now. This has a single item in it which is our mail limit selector. This selector is going to be different on every site and this gives us the chance to do the right selector. Here on guest site config it is simple. We will get the domain name off the URL, if it is a supported domain we will return the equivalent config. We have a set interval here that is simple. We will consume that site config and pull the selector off to hold the DOM to see if we have new mail Windows. New one is marked as found and append clippy. Here in append clip', y we will create image-only, show the dif, give styling and size and append him to the mail Window. I think we will get him working in Yahoo without too many changes and let's see if we can get that happening. Here, let's add Yahoo has a supported domain. We are going to return a Yahoo config. Let's define that. Here we are going to mirror our config from Gmail. Then I cheated and looked up what the selector is so it is going to be editor-container. That was it. Let's see if this worked. I am going to refresh our extension, refresh the page and I don't have any internet. That is unfortunate. Let's connect and try again. There we go! We have got Clippy! [Applause] >> We kind of have a problem. He is a little small? Like in Gmail the mail Window is smaller so he seems like a good size but here he kind of gets drowned out. Let's see if we can use our site config to fix that. Let's add site-specific styling. Down here in append clippy instead of using the hardcoded value let's assume our site config and design a clippysizevalue there. Up here in Gmail let's do clippy size is 85 pixels and over in Yahoo let's make him 120. Now if everything worked according to plan it should be a little bit bigger. Beautiful. I think we can do one better. Remember principle number one if we can avoid site-specific code, we should, because that way we can keep site configs as small as possible and make it super easy to add new sites in the feature. I think all we want is for clippy to be a certain percentage of the mail Window size. Let's do that. Go back to the Code Editor and remove the value off site config and down here in append clippy let's use the mail height dynamic to determine how big clippy should be. I will put the height property off there and that is going to be a string so let's parse it into a float. Then let's make clippy three tenths the height of the mail Window. If everything went according to plan, he should be the same size exactly as he was before but now we are doing it in a dynamic way that will work on any host site. Awesome. [Applause] >> Let's go back to the slides. Now we have talked about how to fulfill principle number one. Let's talk about principle number two. Don't break the host page. This one seems obvious but it is a little tricky to follow if you are doing anything adding UI to the page. Perhaps avoiding the host pages markup might be a better way to identify it. Remember we talked about the extension living in an isolate world, remember the one thing we share is the DOM. If we go making changes to the DOM, especially mutating or deleting stuff that is there, the host page won't be expecting it because they don't even know we exist. Let's use a simple example. Here is a paragraph rendered by a page. A paragraph with a little text. Let's say we as the extension want the highlight the word hello. The main think you could try is delete the text notes in the paragraph and insert a highlight span containing the word hello and a new text Node containing the word world but we have a problem. It turns out this page was rendered using React and frameworks like React really don't like it when you go working with the DOM behind their back. It is keeping a virtual representation of the DOM in memory and still thinks that text Node is there. Let's say it goes through another vendor pass and wants to change the paragraph to say hello, Seattle. Best case scenario is that React blows away the whole paragraph element and replaces it with a new one. That is kind of a bummer for us because now our highlight is gone but worst case scenario is React just tries to replace that one text Node, which is no longer there, and it blows up with a non-failure. Now we have broken the page and thrown principle number two out the Window. This seems bad. Highlighting text is kind of a core experience of our product at Textio. If your Chrome extension can't add inline spans around the host markup then what are we supposed to do? We did like many engineers before us and we drew notes and eventually we came up with a different highlight solutions entirely and that is how the sandwich was born. Despite it having the stupidest code name every it is the highlight of the Chrome extension. It is called the sandwich because it is made up of layers that are positioned one on top of each other and gave us fine grain control of how we edit the page and don't touch the native editor. Let's say the text from the native editor look like this but because this is a rich text editor users could change the text color or add background color or maybe change font sizes. Let's say we as the extension want to add highlights to the page. A green highlight around our team and orange highlight around connect with you. For added fun, when you hover over a highlight, we want to turn the text white. In the sandwich, the top layer our native editor. We want it on top so it picks up all the mouse and keyboard events like normal and we will not touch the markup. We will only make it transparent except for the cursor. Directly behind the vendor is our text copy. This is the exact duplicate of the native editor with all the background colors hitter. When you type in editor with the sandwich mode on it this is the text you will see. Because we only the copy it is save to modify and turn hovered text white. Below that we have colored divs positioned to sit behind the text they highlight. The phrases we want to highlight can be calculated where we want to position these in the DOM. Last but not least we have the background color copy. This is the same as the text copy with the text colors hidden and background colors visible. This allows us to draw highlights so they don't get obscured. Put it together and you get beautifully highlighted text in someone else's rich text editor. We don't have the touch the native markup and are much less likely to break the page. Like with most things, our sandwich highlighter came at a price. It solved a lot of problems but made new ones. When I say we made duplicates it turned out harder to be than I thought. Making the contents is easy but making the copy look the same is not. We used a combination of still overrides to get them to look right but oh, man, we would get one pixel differences throwing off the layout and causing heuristics all in one spot. Next, we had a problem picking up mouse events properly. They don't propagate down to the other layers. We want you to hover over a phrase and change the styling. We implemented our own mouse dropping and put a listening on the top most element and from there we can calculate if the mouse is within the bounds of highlights. If the mouse moves into a highlight, we can take it down to the lower layers as if it happened natively. This felt like a total hack but we could not find a better way of getting the mouse events to hit both layers at once. We hit snags also getting the layers to be positioned on top of another. In order for this to work, we need a parrot Node positioned as well and remember principle two we don't want to break the host page and if we slap a position relative on a parent Node we risk doing that. To solve the technology, we pulled in an intelligent called the shadow dom. I will be brief about why this helped us. Let's say this is a piece of the DOM we got from our host page and this is called the light DOM as opposed to shadow DOM. We made a parent Node with a child note. The shadow DOM lets us attach a subtree that is hidden from the main application. Within that subtree we can render a div with position relative and from there we can drop in our original child Node. This is using the slot API which lets you insert like-DOM shadow hosts into the shadow tree. Now we can add the sandwich layers as siblings. This is great. To the application, the DOM tree thinks the native editor is a direct child of the shadow host but what is rendered to the passenger is -- page is that and we can render till our hearts desire. How did it work out? Was our Chrome extension a success? I probably wouldn't be here talking to you if it wasn't so let me show you. I am going to mirror my screen again and let's come over here. So, here we are in Gmail again. Now we are writing a recruiting mail. Let's turn on Textio for Chrome. Boom! Beautifully highlighted text in someone's editor. We have highlights in the subject line and body and we have a score. An orange phrase means this is causing less people to respond to my recruiting mail. Let's replace that and now it is green meaning it will attract people and the score updated. We can type in real-time and update. If I say wow, this is great. Not only do highlights show up in the text I just typed but the ones there move over. Then, because this is a rich text editor, let's do terrible things like making the font big and adding some text color. All of it just kind of works. Because we wrote this not as a Gmail editor but a highlight editor it also works in LinkedIn. Here we are recruiting our fantastic host and you will notice it is the same experience here as it was in Gmail. We are happy because we are not texting the native editor's markup' and less likely to break a page and this is extensible and we can drop it into any editor and it will work. We are using the site configs and it is easy to expand out as Textio grows into other domains. Let's wrap this up. OK. If there is anything I want you guys to get out of this talk it is these three things. Besides sick memes, obviously. A browser extension is an incredibly powerful tool and really the only limit to what you can accomplish with them is your imagination. Two, writing code that runs on someone else's site is an adjustment. It is a mental shift in show you approach problems and there is going to be a mess of challenges. Keep the principles in mind and avoid site specific code and don't break someone else's page. Trust me you will thank yourself later. That is it for me. I hope you are all inspired to build extensions after today. There is good information and the slides from today. If you have questions, find me here, or online. I would tell you you could find me on Twitter but that would be a lie because I don't have one so e-mail or GitHub works great. Thank you so much, everyone, and happy building.
B1 extension dom site page editor browser Augmenting the Internet with Browser Extensions // Shannon Capper // CascadiaJS 2018 2 0 林宜悉 posted on 2020/04/15 More Share Save Report Video vocabulary