Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Web application from scratch, part I (defn.io)
370 points by Bogdanp on Feb 25, 2018 | hide | past | favorite | 65 comments


I love seeing projects/tutorials like this. Too many developers treat too many layers of the stack as "magic", and this is great to show it's not so.

We should change the definition of "full-stack developer" to "given a computer, an opcode reference sheet, a way to enter bytes into a computer, and enough time, is capable of making an operating system and a bunch of applications" :)


I came up through sysadmin, networking, infrastructure type work. I always assumed that programmers knew everything I knew PLUS how to write programs.

First project where I spent any appreciable amount of time with programmers was eye opening. I was perplexed that not a single one of them had ever run a network sniffer or strace or had the faintest clue about HTTP or how browsers maintained state with their application. I squatted in a room with them for 4-5 weeks and acted a bit like an xray machine for any problems they were encountering. I didn't know anything about programming, so they would tell me what the code was supposed to do and I would show them what was actually happening on the wire or filesystem or whatever. It was a lot of fun and we all left the exercise having learned a lot more than expected. This was nearly 20 years ago, tooling has improved a good bit since then but it still helps to be able to get into the weeds.


Well ya, was never able to figure out what a netmask did so went into programming instead.


This mirrors my experience and assumptions, I always assumed developers knew more than me- it's not so and I'm glad that we can compliment each other skills and learn from each other.

I do wish that the skills of sysadmins weren't so downplayed though. (Companies hiring SWE for Ops positions and the notion of 'NoOps'[0] being examples)

[0] https://go.forrester.com/blogs/11-02-07-i_dont_want_devops_i...


Had an opposite experience. Did QA for an Android app which set up a local VPN and selectively pushed traffic to a SPDY proxy doing some magic.

Had to do a fair bit of work with Wireshark to troubleshoot protocol issues. It was eye-opening too, especially the amount of traffic generated in the background by Facebook.

Sadly, developers rarely go any further than the network tab in a browser debug tools.


I have kind of the same experience being a database administrator. In so many cases programmers have no clue what is actually happening behind their queries. The worst thing is that they think they know or just don't pay attention! :D


> given a computer, an opcode reference sheet, a way to enter bytes into a computer, and enough time, is capable of making an operating system and a bunch of applications

Yeah, as long as “enough time” is measured in years I could still be a fullstack developer under that definition.

There's still a good reason to treat layers as magic though. Mental working space is limited and abstractions with minimal leakage are the only way to get things done at ever increasing speeds.


I'd say there's good reasons to treat some layers as black boxes, but not as magic; you use abstractions precisely because mental space is limited, not because the underlying object is intrinsically non-understandable.

As I replied to another comment below, I wrote "is capable of doing", not "does".


This made me think of GEB:

> ... we all know that we human beings are composed of an enormous number of cells (around wenty-five trillion), and therefore that everything we do could in principle be described in terms of cells. Or it could even be described on the level of molecules. Most of us accept this in a rather matter-of-fact way; we go to the doctor, who looks at us on lower levels than we think of ourselves. We read about DNA and "genetic engineering" and sip our coffee. We seem to have reconciled these two inconceivably different pictures of ourselves simply by disconnecting them from each other. We have almost no way to relate a microscopic description of ourselves to that which we feel ourselves to be, and hence it is possible to store separate representations of ourselves in quite separate "compartments" of our minds. Seldom do we have to flip back and forth between these two concepts of ourselves, wondering "How can these two totally different things be the same me?"

-Douglas Hofstadter, "Gödel, Escher, Bach", Chapter 10


As I replied to another comment below:

I understand the distinction you're making but in my experience the words are interchangeable. Just because I treat a system as magic one moment does not mean that I can't later dive under the hood and fix it the next. It's just a matter of context switching.


Many thanks to Reificator, avyfain, and ggambetta for the string of comments which synthesize so much of how i think about cyberphysical systems, but in a way that has crystalized it and given me a breakthrough in writing for my research. I'll ping you the result of this if you are open to a dm. Either way, you've given me a gift in your comments and bogdamp, awesome post! Quite pure and actionable insights.


This comment reminds me of Feinman describing how he out-computed an abacus expert just extrapolating from a few facts he had memorized. I think mental working space is limited and should be optimized on (as in, don't load your mind up with too much low level stuff), sure, but it's the ability to synthesize known tool sets and their capabilities rapidly that helps more. Learning 'another way of thinking about it' is generally the value of doing these first principles style tutorials although I think one can take it too far.


Yep. And what happens far too often is developers concentrate on one layer and framework, then spend all their time learning tiny details about the abstraction system they're using instead of the job they're doing.

New version comes out, something breaks in a non-obvious way, the industry moves on? They're left with a head full of nothing.


That's less about layers of abstraction and more to do with intellectual laziness in general.

Abstraction is a tool, and like all others it can be abused.

> All problems in computer science can be solved by another level of indirection - except for the problem of too many layers of indirection.


I agree it is about intellectual laziness.

I was a developer for a decade or two before I finally realized something critical to my career: other developers are actively selling me on things that may or may not be a net benefit to me as the years pass.

Developers are a market just like any other, so people make abstraction layers and then become "evangelists" that go out and sell people on stuff. What's to sell? The idea that the framework takes away all the hard work. (I was a big sucker for frameworks with slick UIs back in the day.) We want to believe with a minimum amount of thought, something super cool that will come out that users and other developers will fawn over.

Once I started teaching developers, I ran into people who called themselves professionals but were unable to think through simple activities. Yes, these people were lazy, but they were also products of an environment where they were just moving from shiny object to shiny object. They knew a little bit about various abstraction layers here or there, but nothing about how anything worked. Most of the time, once they went off "happy path", they just banged away at the compiler until things looked like they were working. And these were people being paid to program!

I completely agree this is because of laziness. I don't think it's the entire picture, however.


I think it's a different method of working, and I'm not sure either is absolutely superior. They have different strengths in different environments and time horizons.

I personally have a hard time working on something when I can't see through the abstractions to how it works, at least to the CPU and network level. Without knowing how something is working, I don't have a mental model of failure modes or performance characteristics, and I don't know how to compose a solution from the bottom up that will scale well and work reliably.

Thus when I come into something new, I'm frequently less productive than I've observed other people to be, who just take the abstraction at face value, and start coding in terms of it. I catch up in the medium term and overtake in the long term though, because I can debug implementation issues and don't fall into traps where the abstraction is a poor fit.

I consider these two different styles of working as top-down (abstraction first) vs bottom-up (needing to understand the mechanism).

The two styles use very different methods for evaluating third-party libraries and frameworks. Top-down tends to evaluate for popularity and social proof above all else. Even if it only covers 5% of the solution and needs workarounds and add-ons for core solution use cases, it gets used because it's popular. Whereas bottom-up tends to evaluate for conceptual and implementation simplicity, possibly at the cost of under-abstracting the solution, and running the risk of NIH causing redundant work.

Top-down tends gets something up and running and demoable / sellable faster. If you're ambitious in a big organization, or early in a startup, it's the way to go, because it looks like you're moving really quickly, and for MVP purposes you are. Bottom-up tends to build something that will last and can be maintained in the long term, with a smaller semantic gap between the problem domain and the chosen abstractions for the solution, usually because there's an extra abstraction layer that's missing in the top-down solution.

That layer that bottom-up tends to build, and top-down doesn't, is very similar to what Paul Graham talks about in programming bottom up in Lisp: http://www.paulgraham.com/progbot.html - changing the language to suit the problem. You build a library that lets you compose solution primitives at a higher level with just the right chunking for your problem.


It’s worth pointing out, I think, that Feynman concludes by saying that a bit of luck was involved:

“Furthermore, the whole idea of an approximate method was beyond him, even though a cubic root often cannot be computed exactly by any method. So I never could teach him how I did cube roots or explain how lucky I was that he happened to choose 1729.03.”


You shouldn't treat anything as magic. Whenever you look under the hood of things that look like magic you always find that they have been built by people like you and me and are perfectly understandable if you put in the time.

For example i always thought that Windows was developed by people who are much more qualified than myself but when I saw some parts of the source code I quickly realized that it was written by human beings without any magic.

Treat things as a black box but never as magic.


I understand the distinction you're making between black boxes and magic. However, in my vernacular they're the same thing. I can describe something as magic to one group and describe the internals to another group without issue.

If magic wasn't a black box it wouldn't be magic.


Why stop there? What about assembling the computer? In fact, he should be able to make the electronics if he truly knows about computers. Now that I think of it, given a mine, our superhero should be able to find ore needed, and given a field, he should be able to create electricity.

Then we have a true full stack developer. One that shouldn’t feel bad about himself.


Actually yes, I know how to assemble a computer as well as to do sysadmin stuff (or at least enough to be able to search for the relevant docs).

Knowing about the abstraction layers under your javascript helps a lot with efficiency. If more people knew that maybe I wouldn't need 16 Gb of ram just for the browser tabs...

Edit: that doesn't mean you should write your own web server every time you do a new web site, but it would help if you do that once to get an idea of what's going on.


> One that shouldn’t feel bad about himself.

Nobody is suggesting that people who don't know all this stuff should feel bad about themselves, just that striving to understand it is admirable.

Also, the idea that you can continue in the same thought process all the way to mining is a little bit weird. Understanding your CPU, its ISA, your OS, etc. are arguably all potentially relevant for software developers. Knowing how to mine is not.


Certain Minecraft and Factorio players have done things like this often.


I like this kind of tutorial as well but I have a hard time telling who this particular one is aimed at. It's definitely not any kind of beginner.

The socket API is nontrivial and is hardly explained at all. A lot of implementation magic is just coded right in without a word of explanation. The shiniest Python wizardry is used throughout without much of a benefit to clarity - you've got a generator, f-literals, type annotations, typed name tuple syntax, etc. Even if you can read and like type annotations as a documentation aid, the code as written says weird things like 'an http request is a a named tuple and depends on sockets'.


I'd guess it's written for someone comfortable with Python or programming in general (so they understand the syntax quickly) but has mostly worked with web frameworks and libraries.


I thought I was comfortable with Python but I've never seen `-> typing.Generator[bytes, None, bytes]` nor that arrow syntax before - guess I have some catching up to do.


Heh, I jokingly refer to myself as a full-full-stack developer. I can do schematic design and board layout, baremetal or RTOS embedded software, Linux kernel hacking, etc all the way up to web apps with React (preferring Elixir on the backend but shrug).


I assume this is sarcasm.

Those who do not build upon the achievements of others are far less likely to reach any new heights.

It's good to remember what we take for granted, and it's good to understand what is happening in layers below our concern. But pragmatically speaking, it's more valuable to know how to study and understand what's beneath us when we need rather than spending time proving ourselves just for the sake of proving.

Do you still hunt, kill, and butcher your own meat?


There's a huge difference between using the prepackaged black box solution for the sake of efficiency (as everyone should), and not caring at all to understand even the basics of how it actually works. The later usually leaves one too intimidated to change anything and vulnerable to any unexpected situation. And things break all the time, you need to debug problems with tools and servers and run into all kinds of exotic issues. I once worked with a front-end dev who would restart his machine when he needed to reset a local web server, because "it's not his job" to learn a bit about using the command prompt. For me that is plain crazy attitude and it's not that rare...


> Do you still hunt, kill, and butcher your own meat?

A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.

- Robert A. Heinlein


It's not sarcasm. I don't hunt/kill/butcher my food. But note that I wrote "is capable of doing", not "does".


> Do you still hunt, kill, and butcher your own meat?

Um, yes? it is not that hard. You do need land for it.

I think reaching new heights requires a full understanding of how things work in general. Here is the thing with computers, it isn't that hard to get that knowledge. You just have to apply yourself and build a great deal of things over a career.


If you're buying software, you don't need to understand how it was made. It's similar to meat. If you're producing meat, you have to butcher animals at some layer.


Surely you mean "given a stack of microchips and lengths of wire"?


Or maybe "given an area of land rich in minerals… yada yada …grow a silicon ingot… yada yada …now you have a virtual DOM webapp. Congratulations, you now have the 'fullstack' merit badge!"


If you wish to make an apple pie from scratch...


In my view, full-stack is principally a business term, meant to communicate how many people you have to hire, or whether you can skimp on hiring.


> Too many developers treat too many layers of the stack as "magic", and this is great to show it's not so.

My high-school maths teacher referred to it as "just plumbing".


This makes me think that Chuck Moore would be the prototypical full-stack dev... just give everyone a Forth image with core vocabulary and turn them loose.


Couldn’t help but think of this quote...

“If you wish to make an apple pie from scratch, you must first invent the universe.” -Carl Sagan


Now there's an article I'd like to see.


> we need to create a socket, bind it to an address and then start listening for connections.

This is the most important part. Is there a tutorial on sockets, perferably not written from a C perspective and in a language agnostic way? How OS specific is it? Is there a common basis over most OSes?


Since the socket/networking code is in the OS libraries its in C. Other languages usually just wrap them. They're pretty similar across OSs (I used to do sockets in linux/hpux and solaris).

There was a book I borrowed that was useful, UNIX Network Programming, Volume 1, Networking APIs: Sockets and XTI

I found beej's guide useful as well. http://beej.us/guide/bgnet/html/multi/syscalls.html


Sockets are a C language API because of their Unix origin. Other languages provide wrappers around the underlying C API. If you want to understand something like select() vs. epoll(), there's no way around C.


You could do worse than start with the FreeBSD Handbook section on sockets as it covers a lot of the how and why things are glued together. It is C based however, as *nix and data structures were built in C.

[0] https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers...


There's a quick overview for python here:

https://docs.python.org/3/howto/sockets.html

Which should go well with:

http://aosabook.org/en/500L/a-simple-web-server.html

And a follow-up to that might be:

http://aosabook.org/en/twisted.html

Now, this is all python - but lessons from the first two should be helpful in understanding most socket client/servers.


Sockets are mostly implemented outside of userspace. It's actually very easy to make a TCP socket in C: https://stackoverflow.com/questions/204169/how-to-create-a-t...

UDP is pretty similar but you have to change the code to be connectionless.


Everything is either BSD sockets or Windows. And Windows provides a BSD socket API for you anyway because nobody, having written select() based code and got it working, wants to rewrite it. C perspective really is the One True Way; other languages map directly to it.

Some languages provide their own useful abstractions (e.g. Python's Twisted/Tornado).


I don't think you can teach it well except through starting with C.

Berkeley sockets is a weak API that became a standard. And you need to know what is happening at the C layer in order to reason about what is going on.

People who know a lot about a thing get Stockholm syndrome. So, a couple of quick notes for why I think it is a bad settlement.

It is riddled with special-cases that you need to know about. For example, there is a key function called select(2). When you call it, it arranges lists that tell you sockets that are currently readable, writeable, in-error. But this knowledge is contextual. If select(2) tells you that a socket is writeable, you need to know what that socket was doing to understand what that means. (Is it an outbound client socket, which is currently making a connection? Or is it a server socket, that is listening? Or is it an established client socket, for which you are the server?)

The API does not give you a convenient place to do this tracking. So you, the developer, need to build structures to one side to track it yourself. This is a form of engineered coincidence that the API foists on your system.

Berkeley sockets has notions of Server and Client. In practice, this causes problems because the handling you do for an inbound Client when you are the server is completely different to the handling you do when making an an outbound Client connection. (It would have been better if they had created separate ideas of Server, Client and Accept.)

Windows has some high-performance APIs that take a quite different approach to Berkeley sockets. They have their own problems, but they demonstrate that the design choices of Berkeley sockets are arbitrary rather than inevitable.

The Berkeley sockets API is good at one thing: once you have learnt all of its dumb tricks, you can reason about what the OS is doing. For this reason, there is a danger to putting convenience APIs on top of it. Stuff will go wrong, and your layer now acts as a barrier between the developer and their problem. (This does not stop people from trying, my own convenience effort is at github.com/solent-eng/solent.)

Step back. I was discussing why you need to know C. When you do socket programming in python, you might be tempted to do this,

    lst = [my_socket]
    (rlst, wlst, xlst) = select.select(lst, lst, lst, 0)
Underneath, the OS is modifying lst. Hence, the code above does not work as most would expect. rlst, wlst and xlst will be pointers to the same lst, and this code is broken. Approaching the domain from C, this would be more obvious.

"How OS specific is it?"

The low-performing stuff has been standard in unix for 30 years. Windows NT has a version of select that is similar to unix but has quirks: you can't select on a file in NT, you can't supply empty lists to select.

But here is a critical thing. You set out on a journey, thinking you want to learn about sockets. But over time that changes. You realise: in order to be effective with sockets, you must to construct your system in a way that is different to your familiar procedural or functional approaches.

There are several approaches to chose from, each with their own quirks and substantial learning curves: pure async, threading with blocking sockets, event-sourced threading, forking child processes. The decisions you make here will affect how you do other types of IO. The platforms vary a great deal on these other APIs.

I like the elegance of the pure-async approach, but it is only effective for general-purpose systems if you are on BSD, and you need coroutines to do non-trivial work. Event-sourced threading is the best multi-platform approach, but makes it harder to reason about what your hardware is doing.

This is the real issue. Socket programming is a nasty subtopic of a larger issue: how does your program coordinate IO? You will be operating against many systems APIs for that study, all built by people who think in C.


Tried similar stuff few months back. Start a simple socket listening at port 80. Use any browser to make connection. After recieving any connection on server, send a properly formated HTTP response ( precede with 200 ok and must include content-length, everything else I found was optional). Now scale this to include custom root locations. Extending this to handle all the specifications the thousands of common protocols a server should handle will now feel like headache, so here I didn't continued anymore.

Next I wanted to replace the browser with my own client. Soon realized how huge of a obstacle I was heading towards. Got bogged down by the complexity of a simple browser and still havn't got started on this. Maybe someday :D



A blog post similar to this one (same content) that I did last year. I use it as basis for a small project on how to implement an http server from scratch in my distributed computing classes: http://joaoventura.net/blog/2017/python-webserver/


Here is a not quite from scratch (uses Apache for CGI dispatch) application:

http://www.kylheku.com/cgit/tamarind/tree/

Tamarind allows users to manage throw-away e-mail aliases.

I've been using this almost daily for a few years.

It's self-contained; no framework or external libraries are required other than what is in that directory. It just requires an installation of the TXR language in which it is written.

Though it doesn't include its own HTTPD server, it's in a language that I made myself. The arbitrary boundary defining "from scratch" could easily be moved to encompass "own TCP/IP stack and ethernet driver" or "own filesystem", and so on down to the hardware.


I think it's useful to know what your abstractions (i.e. the web framework and the web server here) abstract away, so this is really nice material. But for anything you're actually going to publish, using a well-founded framework or library is the way to go, because so many things you can get wrong is fixed by many smart people in such projects (i.e. Django, Rails, Nginx, Apache, &c).


Not to mention the difficulty that would be trying to roll your own https.


Here is the archive.org mirror, since the server is responding with error 503 (over capacity at the moment).

https://web.archive.org/web/20180226040031/https://defn.io/2...


Thanks! I woke up to realize the site was still on the free AppEngine tier and had used up all the bw. It should be back up now. :D


I get over quota error.


I think you should use .replace(os.linesep, "\r\n"), in case the system's newlines coincide with CRLF.


If I'm not mistaken CRLF is part of the HTTP spec as a field separator.

Edit: I've read the article after this comment, now I see what the parent comment meant. I guess in this case sth like "\r\n".join(["line1", "line2", "line3"]) is best.


It's been a while since I've used Windows, but does that matter since I'm using string literals here? I would think that even on platforms like that, literal string lines would be terminated by `\n`.


This is interesting, but for an internet facing server I would rather use an existing server that would be far more secure than anything I could create in an afternoon.


Of course, but understanding what is going on is also of huge benefit to you.


Yes, agree with that - I've created a mini web server in C# before, but I wouldn't want to expose that to the internet.


Downloading nginx doesn’t teach you how a web server works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: