FSP0001 – How The Internet Works – Facility Science Podcast #1

By | May 7, 2019
Notes for FSP0001 – How The Internet Works.
  • Layered abstraction
    • fundamental concept in electronics, computer science, and associated engineering disciplines.
    • Abstraction in this context means: conceptually take a function, behavior, or concept (which might be very complex), define its inputs and outputs (which we can call the interface), then put that functionality inside a (literal or conceptual) box. The benefit of this type of abstraction is that it allows us to turn a potentially very complex functionality into something very simple (by hiding how it is done and only being concerned about the inputs and outputs) so we can avoid thinking about the details of how it works when we don’t actually need to care about the details.
    • layer the abstraction by using that abstracted (that simple thing that’s really complex inside) element (along with others) to build something complex. Take that new complex thing, define its inputs and outputs and put it inside a box to make it simple. Repeat until you reach your desired outcome. Doing this you can create things that are so complex they are impossible to understand as a whole.
    • Example from building systems: the air handler abstraction.
      • We can talk about an air handler as an abstract device.
        • It has an interface: Air goes in one end and comes out the other end with some change in its properties (temperature, humidity, CO2, etc), and there is some mechanism to modify its operating parameters (controls).
        • It also has some operational specifications (tonnage/btus, CFM, voltage/current, energy consumption, etc) and mechanical/form factor.
        • And that’s all we need to describe the abstract air handler. Without worrying about what is actually inside, we can understand how air moves through the building, troubleshoot problems, and even design a system with an air handler without even talking about what is inside of it.
        • We can interchange any air handler that has the same interface and relevant operational specifications (So we can take one from our favorite brand, more energy efficient, lower TCO, etc, whatever is mot important to us).
      • A real air handler has some stuff inside.
        • blower (impeller + motor with some kind of drive (direct, belt, gear, single speed, multiple speed, variable speed)), cooling mechanism (chilled water coil, refrigerant coil), heating mechanism (hot water coil, electric heat, refrigerant heat pump), might have dampers, humidifiers, etc. Also has some kind of controls, and obviously a duct or passage (or multiple, who knows) for the air to enter, travel through and exit.
        • WE can talk about each of those sub-components as abstract devices also.
          • Example: blower. A blower has an air in side and an air out side, has this many speeds, moves this much air (we might have a CFM curve or something), pulls this many amps, etc.
            • But really if we look into it, the blower is made up of many parts. There is a motor, an impeller, some linkage between the two, some electrical device that drives to motor, etc.
              • Each of those components has its own complexity
                • What is the impeller geometry, is it fixed or variable pitch.
                • linkage between motor and impeller could be direct, gears, belts
                • how is variable speed or CFM achieved, discrete multiple speeds, electronic VFD, gear box, one speed with motorized dampers, etc.
          • Each of the other air handler components can be explored as well: example 2: cooling device. The cooling device could be a refrigerant coil (or multiple) attached to a complex system of multi-stage condenser towers, or it could be attached to a chiller with water piping and refrigeration and etc.
          • Each of those air handler components could conceivable be replaced with other components that operate completely differently as long as they fit the same abstract specification
      • I may be belaboring this, but hopefully I am getting across the idea that we can talk about a very simple device called an air handler, when in reality that device is built up of many complex layers. We hide that complexity inside the air handler abstraction so that when we want to understand how air flows through our building we don’t have to think about specific details like the pitch angle on the impeller or the stroke volume of the compressorfor R410a.
      • May seem overly pedantic for air handler but… take literally a huge pile of sand and rocks and metal and arrange it just so and you have a high speed global data infrastructure and it is completely ridiculous and it wouldn’t be possible if we had to think about the semiconductor properties of gallium arsenide every time we type www.google.com into the web browser.
Layers
  • Ethernet(Defined by IEEE standard 802.3) – bottom layer for purposes of this topic, but really already built on top of many (many, many) simpler, more general layers.
    • Ethernet itself actually defines multiple abstract layers
      • Physical layer
        • cabling, most commonly twisted pair copper or optical fiber. Defines cabling requirements for specific data rates and maximum and minimum cable lengths.
        • signalling, meaning how to physically/electrically/optically transmit data on the wire. This deals with frequencies, waveforms, etc.
        • Hubs operate at physical layer to connect devices together. Hub is a repeater that send all data coming from one ports out to all the other ports. Since only one device can use the wire at a time (because otherwise their electrical signals collide and become nonsense), the non-selective repeating behavior isn’t as efficient as it could be as devices not involved in the communication have their wire occupied by irrelevant communication. We can do better.
      • Link layer
        • Deals with access to the wire by resolving the case when multiple devices transmit and the signals collide.
        • Identifies individual devices on the network by address calledMAC address(MAC = Media Access Control, has nothing to do with apple mac=macintosh). MAC address is globally unique identifier in the hardware. Each device on the network will see all messages that come in on its wire and will ignore any that aren’t addressed with its MAC address.
        • Link layer doesn’t need to know the details of the physical later (the physical layer is abstract). Doesn’t matter if the the data is sent out as waveforms on a copper wire, pulses of light on an optical fiber or notes plucked on a guitar string.
        • Ethernet switch operates at the link layer. Switch understands MAC addresses and learns which devices are reachable from its different interfaces (ports) by reading the MAC addresses from messages coming in and keeping a table. Using the table, the switch doesn’t need to broadcast like a hub and can instead send data out only on the line connected to the intended recipient.
    • From Ethernet point of view there is only one network, which is everything connected to wire. There is no concept of another network and no provision within Ethernet to communicate with another network if one were to exist. This limitation has some notable consequences including:
      • Can only handle a limited number of devices before the physical layer is saturated with collisions, preventing any meaningful communication.
      • There is no way to have a private or secure network that can communicate with devices on a public or less secure network.
    • We can swap out ethernet for different technologies that define the physical and link layers, examples are:
      • Wireless Ethernet (WiFi) (defined by IEEE 802.11 standard). Very similar to 802.3 Ethernet but using radio as the physical layer and a conceptually similar link layer (MAC addresses, etc) but solving the additional problems that come with wireless communication. Circle back to WiFi later.
      • DSL, ISDN
      • RS232/RS485 serial port
      • Pretty much anything that allow 2 devices to physically communicate
  • IP(Internet Protocol), knows about this network and other networks and how to get stuff to another network. Called the network or internet layer.
    • IP device needs several pieces of information:
      • Address calledIP address. Basically just a number, but most commonly represented as a 4 numbers separated by dots. The 4 numbers can each range from 0-255.
      • A network mask often called a subnet mask tells the device which portion of its IP address represents the network and which portion represents devices on this network. For example, if your device has an IP address of 192.168.1.10 and a subnet mask of 255.255.255.0, this indicates that the network is 192.168.1 and the device address on the network is 10. So any device with an address that starts with 192.168.1 is on the same network as your device and any device with something else (like 192.168.0 or 77.26.53) is on a different network. Subnet mask explained
      • The address of at least one (but maybe more) router if this network is connected to any other networks. A router is a device that is connected to more than one network and knows how to pass messages among the connected networks (note: there is a device you probably have in your home that you call a “router” that is actually much more than just a router. I’ll make that distinction later). Most devices will define a “default router” or “gateway” to which it will send anything that has a destination not on this network and for which no specific router is known. For the example above, your device with IP address 192.168.1.10 might be told that the gateway is at 192.168.1.1. If it wants to send something to IP address 8.8.8.8 it will notice that 8.8.8.8 isn’t on the same network since it doesn’t start with 192.168.1 so the only thing our device can do is send the message to 192.168.1.1 (but with the destination address of 8.8.8.8) and hope that the router at 192.168.1.1 knows how to deliver the message. Most likely the device at 192.168.1.1 won’t actually be connected to the same network as 8.8.8.8 but will instead have to forward the message to another router which may then forward it to another router until it finally reaches a router that is actually connected to the same network as 8.8.8.8.
      • IP divides the data to be sent into particularly sized chunks called packets. Each packet gets the destination address and source address and is then handled independently of the others. Since each packet of the message is handled by the routers independently, the individual packets of a particular message may take different routes to the destination (and some might not arrive at all). The process of “routing” packets by looking at each one and then forwarding to the router most likely to be closer to the destination is sometimes called “packet switching”
    • IPv4 vs IPv6: The IP addresses we have been talking about so far (the 4 numbers separated by dots) are IPv4 (Internet Protocol version 4) addresses. A major limitation of IPv4 is that there are only around 4 billion possible addresses which causes some problems. 4 billion seems like a lot. There are more than 7 billion people on the planet plus businesses and governments with their many devices. Estimated close to 20-30 billion devices connected to the Internet (talk later on about how this is possible with only 4 billion addresses). IPv6 is the newer standard (“newer” meaning 1998, so not that new, but still not widely deployed) which (among other things) allows for many more addresses. 2^128 address which is enough for billions of addresses for every person on the planet. Some people have said this is more addresses than there are grains of sand on earth or than atoms on the surface of 100 earths.  I didn’t check their math, but it’s a lot of addresses.
    • IP doesn’t know how to actually physically reach another device. It needs Ethernet (plus a little bit extra glue to figure out which MAC address corresponds to the destination IP address) to do that. Really IP doesn’t need Ethernet specifically, just something that provides that link layer and physical layer (It can be Ethernet, DSL, RS232/485, carrier pigeon (see RFC 1149 https://tools.ietf.org/html/rfc1149)).
    • IP doesn’t actually verify the existence of the device on the other end, doesn’t make any effort to verify delivery of the message or have any means to retransmit undelivered data or reorder or reassemble fragmented data at the receiving end. We need another layer.
  • TCP (Transmission Control Protocol) provides connection-oriented transport mechanism for reliable error-corrected data transmission from the source endpoint to the destination endpoint.
    • TCP and IP are often talked about together as TCP/IP or TCP/IP stack (“Stack” because of the layers of protocols piled on top of each other).
    • TCP uses port numbers to identify each particular application connection allowing it to sort the mixed up stream of data coming in over IP and get everything to the right application. If an application wants to connect to another application over the network using TCP, the application must specify the port number the application is using on the remote end. If an application wants to receive data over TCP, it must tell TCP which port it will using (this is called “listening” on the port). “Ports” in TCP are just numbers to identify each connection or application and shouldn’t be confused with physical “ports” you would plug a cable into such as on a router or switch.
    • TCP is an example of a transport layer protocol. Another prominent protocol at this layer (so used instead of TCP) is UDP (User Datagram Protocol). UDP doesn’t do the error correction or reordering or reliable delivery of TCP. UDP is useful when the data transfer is time dependent such as for live streaming audio or video or real-time gaming when the extra overhead of TCP might cause data to arrive too late or for situations when the application will handle its own error correction or out of order delivery or whatever and the extra overhead of TCP would be redundant.
  • Application layer – call this the top layer. Protocols in this layer implement application-specific functionality using the service provided by the transport layer (TCP or UDP). Some prominent examples (that you probably use hundreds of times a day) of application layer protocols are DNS and HTTP:
    • DNS (Domain Name System) is used to translate names to IP addresses. You might have noticed that we just went through all the stuff that makes the internet work and from TCP down to Ethernet there is no way to connect to a place called facilityscience.com. The names we use to refer to things like web sites are for humans. The computer needs an IP address to communicate with another computer across the internet. DNS provides the means to translate names to IP addresses. [maybe post more info links]
      • DNS is hierarchical. Control of the DNS for a particular domain is
    • HTTP (HyperText Transfer Protocol)
Ethernet up to Application layer defines “INTERNET PROTOCOL SUITE”
NAT (Network Address Translation) – Mentioned earlier that there are 20-30 billion internet-connected devices but only 4 billion IPv4 addresses. NAT is one of the primary ways we are able to get around IPv4 address space limitation.
  • There are some blocks of IP addresses that are reserved for use in private networks and are designated as “non-routable.” This means that anybody can use these addresses on their networks without consulting with anybody else but that they can’t be used to communicate across the Internet, as in Internet routers will drop them since there’s no way to know for sure where they are going to or coming from. These address blocks are 10.0.0.0-10.255.255.255 (10/8), 172.16.0 0-172.31.255.255 (172.16/12), and 192.168.0.0-192.168.255.255 (192.168/16). You might recognize some of these address blocks from your home or office networks. I’ll bet you have probably seem an address something like 192.168.1.1. (see RFC 1918 if you want the historical context) The point is that inside private networks, like the one at your home or office you use an IP address like 192.168.1.100 and so does everyone else (at their house or office) so we can just keep reusing these over and over. The downside of this is that you can’t use this address to access anything on the Internet because anything with 192.168.1.100 as the return address will be dropped by any reasonable router on the Internet and if it weren’t dropped, there would be no way to get it back to you anyway since your address is used by everyone everywhere. Somehow you can still get things from outside your network (Internet) on your home PC. We need something else to make this work.
  • NAT (see RFC 1631if you want some historical context). YOU have this thing on the edge of your network called a router (sometimes a gateway , which is just a specific use of a router). When you send something over an IP network, you provide the destination address so the network knows where to send it and also your own return address so the device on the other end knows how to send a response back to you. Remember that a router is a device on 2 different networks. It has a “public” IP address on the other network and since it can’t route things to your internal network using the non-routable addresses it doesn’t look like a router to the other network, it just looks like another device. So when you send some data through the router to the Internet, your NAT-enabled router knows yyour ip ad9ptable so it replaces your return address with its own address and then keeps track of which device on the internal network the connection belongs to so it can do the opposite with any reply that comes back. The device on the other end of the connection  doesnt’t send its reply back to your address, it instead sensd the reply back to your router’s address and the router forwards the response back to you. With this mechanism, everyone can have 10s or 100s or thousands of devices “on the internet” but each network only “uses up” 1 IP address.
DHCP dynamic vs static IP address. – discussed all this stuff that a device needs in order to send messages across a network (IP address, net mask, ip address of router or routers, ip address of dns servers). A new device on a network somehow has to get all of these things. We can type them in manually for what is called astatic configuration. That requires every user to know how the network is set up or a call to the network administrator for every new device. There is a mechanism call DHCP (dynamic host configuration protocol) that allows the device to ask the network for the proper configuration (calleddynamic configuration in contrast to the static configuration I mentioned before). The device that understands this request is called a DHCP server. The DHCP server knows all of those little bits of information and can tell the new device what it needs. A DHCP server usually has to manage a pool of IP addresses and make sure it doesn’t give out the same IP address to multiple devices (this would cause problems with message delivery). If it uses all the addresses in its pool, it can’t give out any more.
Consumer “router”: Circle back to the term “router” and that device you have in your home that you might have gotten from your ISP that you probably call a “router.” There is a router inside (a device that exists on multiple networks, in this case your home network and the “Internet” or more accurately your ISP’s network), but it is also a convenient place to put all the other infrastructure pieces  “routers” that are actually router, switch, wireless access point, firewall, NAT, DHCP.