Michael Ju

About Me

My love for the logicality and creativity of programming has only continued to grow here at Berkeley. I'm an experienced full-stack web developer who has written back-end API services in Java, Python, and Node.js and shaped the user interface with JavaScript tools like React and jQuery. Some of my other interests include algorithms, systems, and data science. But of course, I'm always eager to explore new topics. When I'm not coding, I enjoy running long distances, gaming, soccer, and piano.

Coursework

CS61A: The Structure and Interpretation of Computer Programs

This course focused mainly on Python, covering computing paradigms like recursion and object-oriented programming, data structures like linked lists and trees, and some more advanced topics, including iterators and streams. It also covered functional programming and relevant topics, such as tail calls, by introducing us to Scheme, a derivative of Lisp. Finally, it exposed us to declarative programming by discussing SQL tables, joins, and recursive queries in SQLite.

MATH 54: Linear Algebra and Differential Equations

The first half of MATH 54 introduced us to fundamental but crucial linear algebra topics, including linear systems, linear combinations, independence, and transformations, vector spaces and bases, and eigenvector/values. In the second half, we moved on to ordinary differential equations of second order and higher and partial differential equations, which were much more complex and involved Fourier sine and cosine series.

CS61B: Data Structures and Advanced Programming

In this course, with Java as our tool, we intensively analyzed numerous data structures, including singly and doubly linked lists, array lists, hash tables, binary search trees (both simple and left-leaning red-and-black trees), queues/stacks, heaps, and tries. For each data structure, we studied its advantages and disadvantages in terms of runtime and its methods of implementation. We also went over crucial algorithms that used these structures, the most notable probably being sorting methods like mergesort, quicksort, and radix sort. We finished the year by focusing on graph theory algorithms, like depth-first search, breadth-first search, Dijkstra's algorithm, and Prim's algorithm. Some projects we completed included our own local version control system and an autocomplete program that generated and manipulated a trie from an inputted dictionary.

CS70: Discrete Mathematics and Probability Theory

This was my hardest but most interesting class this semester! In the first half, we focused on logic, studying propositions, quantifiers, proof techniques, particularly weak and strong induction, and graph theory, including Eulerian tours and hypercubes. We also covered numerous topics relevant to computing, such as RSA security and its implementation involving an exponential bijection dependent on modular arithmetic, polynomials and their roles in secret-sharing and error-correcting, and uncomputable problems, like the famous Halting problem. We moved on to discrete probability in the second half of the course, starting at basic counting techniques and conditional probability and moving on to advanced topics, like random variables, Bayes' rule, Chebyshev's inequality, and several significant distributions, including binomial, geometric (exponential in continuous probability), Poisson, and normal.

CS61C: Machine Structures

In contrast to 61A and 61B, CS61C focused on hardware as well as software. The course began with an intro to C; we studied topics like pointers and memory allocation and wrote a version control system in C. We then dove one level deeper into assembly language, using the MIPS assembly language as our learning tool. We not only programmed in MIPS assembly but also studied in depth the different instruction types (R-type, I-type, J-type) of a 32-bit MIPS machine and the diverse functionalities of the compiler, assembler, linker, and loader. Next, we focused on the actual computer hardware: the datapath and the control. Once again using the MIPS ISA, we learned the limitations of a single-cycle datapath and the benefits--but also resulting hazards--of a pipelined datapath with greater throughput. The course also strongly emphasized the topic of caches; we went over different types of caches, from direct-mapped to fully associative, and spent a great deal of time calculating hit/miss rates for sample code. After caches, we focused on techniques of parallelism, from SIMD vectorized operations and SSE intrinsics to MIMD, where we learned how to effectively use multiple threads with OpenMP in C to obtain unparalleled (haha) speedups, all the while being aware of complications resulting from data races and cache coherency. We also went over MapReduce and WSCs as prime examples of parallelism on a much larger scale. Finally, we discussed some other topics to a lesser degree, such as virtual memory and TLBs, Hamming Code Single Error Correction, and levels of RAID as an efficient organizational schema for secondary memory. Some projects we worked on over the semester include building our own assembler and linker in C and MIPS, building our own datapath + control in Logisim, and running a version of PageRank using MapReduce on Amazon's EC2 servers.

CS170: Efficient Algorithms and Intractable Problems

Another very tough but very interesting and useful class. We ended up going over numerous useful problem-solving techniques throughout the semester, and I definitely think that my ability to analyze a problem and devise algorithms has improved as a result of this class's extremely challenging homework problems. We began by going over divide-and-conquer techniques. We then moved on to graph theory, covering basic depth-first, breadth-first search, and DAG linearization. We then went over shortest-path algorithms like Dijkstra's and Bellman-Ford (for the presence of negative edges). After extensive graph theory, we focused on greedy algorithms like Kruskal's, Prim's, and Huffman encoding.

Next came two heavyweights of algorithmic design: dynamic and linear programming. I absolutely enjoyed dynamic programming because it is able to solve in polynomial time problems that seem, at first glance, tremendously difficult. I loved discovering the recurrence relation for a problem that told me the order in which to solve my subproblems because they were often so simply elegant yet so very powerful. Some particular problems discussed in class were edit distance and the Floyd-Warshall algorithm, used to find the shortest paths between all pairs of points in a graph. Linear programming also came across as extremely powerful. We discussed flow networks, the Ford-Fulkerson algorithm used to find the max flow, and reducing any circuit, and thus any polynomial-time algorithm, to a linear program.

For our last topic, we covered intractable, NP-complete problems, such as the infamous Traveling Salesman Problem, the satisfiability problem, and the set cover problem. We studied reductions and tackled many questions where we had to reduce an NP-complete problem A to a problem B to prove that B was NP-hard. We also went over a couple approximation algorithms for certain NP-complete problems. In fact, one of our final assignments was a coding project where we had to devise approximation algorithms for a given NP-hard problem, Maximum Acyclic Subgraph (given a directed graph, find the least number of edges to remove that will result in no cycles). Overall a very useful class and a great introduction to theoretical computer science.

CS186: Introduction to Database Systems

This course was very comprehensive in its coverage of database systems; we moved from the top layers of conceptual schema design, SQL as a declarative language, and query optimization all the way down to the physical design choices and buffer management strategies behind the scenes. We began the year by discussing external sorting and external hashing as methods to sort/hash data that was too large to fit in memory while minimizing I/O cost. After a subsequent period of focus on the SQL declarative language, we tackled physical database design. We first covered buffer management and page replacement policies, like LRU, MRU, and clock replacement. Finally, we went over indexes as solutions for fast lookup by key. We discussed the tradeoffs between ISAM and B+-Trees, insertion and bulk loading on B+-Trees, and the important differences between clustered and unclustered trees.

After physical design, we moved up a level to query optimization. We learned how to analyze the cost of different join methods, such as page-oriented and block-oriented nested loop joins, index nested loop joins, sort-merge joins, and hash joins. We then optimized query plans through the System R model of cost estimation and dynamic programming, focusing on left-deep plans to minimize the need for materialization of our joins.

Next, we moved on to concurrency control and recovery. We discussed the meaning of ACID transactions and learned to associate atomic and isolated transactions with serializable scheduling of transactions. We used 2-phase locking to guarantee conflict serializability and learned to detect and avoid deadlock situations among transactions. For recovery after a crash, we utilized the steal, no-force ARIES protocol with write-ahead logging, redoing all updates up until the time of the crash and then undoing the updates of all non-committed transactions.

In the last part of the course, we moved to an even higher levels and studied conceptual database design with entity-relationship models. With the help of entity-relationship diagrams, we designed systems with many-to-many, one-to-one, and many-to-one relationships. We next learned how to refine database designs and minimize redundancy by taking advantage of functional dependencies and normalization.

Projects in this class often involved using Pyspark for big data processing on RDDs. Some interesting assignments we had included coding our own strict 2-phase locking system and implementing K-means clustering through Pyspark to group together committee locations of the 2016 presidential candidates.

CS188: Introduction to Artificial Intelligence

I'm glad I took this course because it introduced me to many stimulating and unfamiliar topics in AI. We started off with basic concepts, such as state spaces, search algorithms, constraint satisfaction problems (solved with arc consistency techniques), and minimax/expectimax game trees (optimized with alpha-beta pruning). We then moved on to learn about Markov Decision Processes and, more importantly, reinforcement learning, where the agent learns about its environment online rather than offline. We practiced techniques like Q-learning to extract the optimal policy for an agent in a particular environment. (Personally, this section of reinforcement learning was my favorite part of the course.)

In the second half of the course, we moved on to probability. We learned about conditional probability and Bayes Nets, and how these directed acyclic graphs could be used to represent the joint probability distribution of numerous random variables. We also practiced inference-- making queries about the probability of certain variables given some evidence and some hidden variables--through variable elimination as an exact method and sampling techniques as approximate methods. Next, we covered Markov chains and hidden Markov models as representations of how the hidden state of an environment could be tracked with a transition model and observations at each time step. We ended the course with an introduction to machine learning and some of its fundamental components, including perceptrons to classify training data, multi-layer neural networks for deep learning of features, and gradient descent on the parameters of our model to minimize our training loss.

CS162: Operating Systems

An incredibly important class for me. It broke down numerous abstractions that I had just been taking for granted up to now, like how a machine could run simultaneous applications or how a filesystem worked. We started off by covering OS fundamentals like kernel syscalls, threads, processes, thread scheduling, and synchronization tools like semaphores, locks, and monitors. We then dove into virtual memory, caching, and demand paging as ways that processes could have separate address spaces and be given the illusion of infinite memory. After virtual memory, we transitioned to file systems, looking at several case studies like the simple FAT filesystem, the UNIX FFS, and the Windows NFS and the unique implementation details for each one. Finally, we ended the semester with a focus on distributed file systems, networks (with an emphasis on the TCP transport layer), and security (symmetric/asymmetric key encryption, integrity hashes, and certificates).

The homeworks assignments were generally more interesting than typical CS assignments. They included writing your own shell, your own basic HTTP server, and your own malloc and free programs. The bulk of the course's work, however, were in its three projects. We worked in project groups of 4 to design our own (simplified) Operating System called PintOS. In the first project, we implemented kernel threads, thread scheduling, and synchronization variables. The second project focused on supporting user programs and corresponding syscalls, like fork, exec, wait, and exit. Finally, the third and most challenging project had us create our own filesystem, complete with a buffer cache, extensible files, and directory support. All projects and homeworks were written in C.

CS176: Algorithms for Computational Biology

I took this course after liking CS170 so much and enjoyed another stimulating semester of problem-solving. It was very interesting to learn about algorithms that had applications beyond the scope of pure CS and the tech industry, some of which were actually used daily in the professor's computational biology lab.

The course's first major section was on exact and inexact string matching. We covered algorithms like the Z-algorithm and crucial data structures like suffix trees and suffix arrays that could be used to solve the problem of finding all instances of a pattern in a given reference string in linear time. In my opinion, these applications had the most overlap between computational biology and software engineering in the tech industry. We then went over the Burrows-Wheeler Transform as a way to more compactly store the reference string via Huffman encoding to allow for numerous efficient lookups. Finally, we ended the section by covering inexact matching, where dynamic programming algorithms like Needleman-Wunsch (similar to the edit distance algorithm) became essential.

Our next major section covered phylogenetic trees, binary trees used to depict similarities and potential ancestor chains among a set of taxa as accurately as possible (these taxa could be different species, different genes/proteins, etc.). We covered various topics, like the difference between distance-based and parsimony-based trees and how to determine whether we could construct a tree with the maximum possible degree of accuracy in representing the differences between the taxa. In the last section of the class, we dealt with probability. We used homogeneous Markov chains as a way of calculating the likelihood of a phylogenetic tree, using transition probabilities for parent nodes in the tree transitioning to the state of their children. Lastly, we went in depth with HMM's, looking at how to calculate the likelihood of parameters given only observations with the forward algorithm, determine the most likely sequence of hidden states given observations and parameters with the Viterbi algorithm, and find the most likely set of parameters for a set of observations with the expectation-maximization Baum-Welch algorithm.

CS189: Machine Learning

A great introduction to the field of machine learning, which is becoming so ubiquitously used in today's technology. It was easily the most math-heavy CS course I have taken, and it illustrated that linear algebra and statistics are extremely useful for understanding machine learning models in depth.

We first covered linear classifiers, starting with perceptrons and support vector machines and moving on to study a generative approach to classification with Gaussian discriminant analysis. Next, we moved on to regression, tackling linear and logistic regression, as well as ridge regression and Lasso (using regularization to reduce overfitting). We also went over statistical derivations of these regression models and discussed the important bias-variance tradeoff of any model. Along the way, we discussed optimization methods, such as gradient descent and maximum likelihood estimation, and other topics like feature lifting and kernels, which could be used to train from an amplified number of features without incurring an additional performance cost.

In the second half of the course, we covered decision trees, which provided a very intuitive method of classification. And of course, we extensively studied neural networks, going over both their fundamental structure and how it could be tweaked and extended to improve training speeds (avoiding the halting gradient problem), reduce overfitting, and find better local minima. In the course's last segment, we studied unsupervised learning. Topics included using principal components analysis to reduce dimensionality without a large loss in information, employing k-means and spectral graph clustering to group points in space or vertices in a graph, and using latent factor analysis and the singular value decomposition of a matrix to build recommender systems.

Homework assignments were generally very math- or coding-intensive. The most interesting ones required us to build certain machine learning models from scratch. For example, we built our own neural nets to classify the MNIST handwritten digit data set, and we used SVD to build a basic joke recommender system. We submitted our code to Kaggle competitions, where our models would run on test data sets and needed to get above a baseline accuracy to get full credit.

CS161: Computer Security

A fascinating course that enlightened me about the capabilities of cyberattackers, the types of vulnerabilities that they exploit, and the defenses that are available.

In the first part of the course, we focused on buffer overflow attacks that could lead to malicious code injection, as well as several common defenses to mitigate these vulnerabilities, like stack canaries and address space layout randomization. We also heavily covered web attacks, particularly SQL injection, CSRF attacks, and cross-site scripting attacks (both persistent and reflective). Some important defenses/web policies we considered were the same-origin policy (and how the aforementioned attacks circumvented it), the OWASP guidelines on properly sanitizing user inputs, Content Security Policy headers for preventing injection of JS scripts from third parties, and prepared statements for protecting against SQL injection.

Our next module was cryptography. We focused on AES-128 as a good symmetric-key block cipher and studied several modes, like CBC and CTR, for encrypting arbitrarily long messages. We also discussed asymmetric-key cryptography, especially RSA. Furthermore, we went over techniques to guarantee integrity and authenticity, such as cryptographically secure hash functions, message authentication codes (MACs), and digital signatures, which could be used by trusted certificate authorities to vouch for a party's public key.

Finally, we went over internet security. The topics we covered were numerous and diverse, ranging from attacks on the DHCP protocol to impersonate the victim's primary DNS server to an in-depth look into how DNSSEC could prevent dangerous DNS cache poisoning attacks from occurring. Probably one of the most important topics we studied was SSL/TLS. We learned not only about how the TLS handshake worked and was immune to any man-in-the-middle attacks but also about the implications of trusting seemingly obscure certificate authorities to sign cert requests discreetly and protect their private key from compromise. Our last topics were on intrusion detection techniques (signatures, behavioral detection, etc.) and on malware, viruses, and worms.

Projects for this course were fun and challenging. I got the chance to both build secure systems and attack mock vulnerable systems, which made me and my project partner a lot more conscious about looking for exploits in our and in other people's code. Some project tasks included finding holes in sample code that would allow you to inject code through a buffer overflow, conducting MITM attacks on a TLS connection by taking advantage of a given certificate authority key, and building a file system that could protect the contents and integrity of a client's files from a malicious file server. I coded primarily in C and Python for these projects.

Projects

Contact

Independent Project

Skills: Node/Express.js, Socket.IO, Mongoose for MongoDB database

Contact is a real-time chatting web application. It has an authentication system: users of the site can log in and chat with other users in chatrooms. Chatrooms can be customized in several aspects, like privacy and maximum capacity. Also, users can invite other users to their chatrooms and request to join different public chatrooms. They receive notifications about invites and requests in real time.

This project was a way for me to become familiar with not only Node.js and its asynchronous style of programming but also other components of the MEAN stack. Indeed, I ended up using MongoDB (employing Mongoose.js as an object modeler) for my application's database and Express.js for its backend. I was pretty amazed to see that I could develop an entire web application, both front-end and back-end, with only JavaScript. All the components of the app, from database queries to browser scripts, ended up fitting so smoothly that I would sometimes forget which layer I was working on. I was also impressed with Socket.IO, the engine that allowed me to incorporate real-time messaging into my application. Before I found Socket.IO, I was weighing the implementation complexities of other real-time options, like long polling and WebSockets. Fortunately, Socket.IO's simple API allowed me to focus on the business logic behind real-time messaging and notifications and provided me with useful capabilities like broadcasting to multiple sockets. I didn't have to worry about when I should use long polling, when I should use WebSockets, etc.

Probably the most challenging aspect of this project was designing schemas for Mongoose models that correspond to MongoDB collections. Many of my collections had relationships that may have been easier to implement with a relational database and a framework like, say, Django. For example, Users and Chatrooms had a many-to-many relationship. I spent much of my time thinking about how I could maintain coherency between related users and chatrooms (which became more complicated when requests and invites came into the picture) with as few necessary queries as possible. Finally, for better performance, I wanted the browser to store only a limited number of chat messages and the user to be able to load previous messages. To accomplish this, I had to work on design a schema for saving chat messages to the database in equally sized chunks.

This project isn't completely finished. Some of my next steps include writing tests for my middleware and improving the application's UI.

Pollsite

Independent Project

Technologies: HTML/CSS, JavaScript & jQuery, and Bootstrap for front-end display, Django-powered backend

Pollsite is a Django-powered project I undertook to get familiar with the Django framework and learn about the functionality it offered, as well as get experience with designing user interfaces with Bootstrap and JavaScript/jQuery. It's a simple website built off the Django tutorial that lets registered members make their own polls and vote on other member's polls. Pollsite is composed of two applications: a basic user authentication application and a voting application that processes users' votes as well as their creation and removal of polls through a dynamic UI. This project definitely accustomed me to Django's model-view-template system, as well as several other Django and general web development paradigms, like always remembering to persist changes to models' attributes to the database and returning an HTTPResponseRedirect after changing server data. It also made me appreciate how easy database-backed web development was with Django, which did all the SQL work for me.

Shell

Classroom Project (CS162, Fall 2016)

Technologies: C

Our first assignment in CS162 was to write our own shell in C. This shell supports basic built-in commands like cd and pwd. It can also run programs by forking a new process that executes a program provided by the user. Within this feature, the shell supports path resolution by looking in the PATH environment variable. Finally, the shell allows for processes to execute in the background and implements signal handling such that signals are only delivered to the foreground process group.

Gitlet

Classroom Project (CS61B, Spring 2015)

Technologies: Java

Gitlet was one of the main projects students completed in CS61B. Using only Java and its standard libraries, we were asked to design and implement our local version control system. The system would have many of Git's commands (hence the name Gitlet), including add, remove, commit, checkout, branch, merge, and rebase. This project was significant because unlike earlier assignments, we were given no starter code from the course instructors. I ended up learning and implementing some invaluable functionalities, such as copying and comparing files using byte streams and saving the state of Java objects with the Serializable interface. In addition, since each operation we needed to complete came with time and space requirements, and since 61B was all about data structures, my Gitlet ended up being heavily dependent on how I designed my objects. For example, I ended up modeling my branches after singly linked lists, with the nodes being Commit objects, so that I could perform operations like log, checkout, and merge as efficiently as possible.

Experience

Glassdoor

May 2016 - August 2016

Skills: React.js, jQuery, JavaServer Pages, Java

I interned for Glassdoor's core team and acquisitions team, completing projects under the company's lead web developers. My main stories involved front-end refactoring, transforming common UI features of the Glassdoor website into React.js components that are modular and reusable anywhere on the site. I coded and tested two major components that currently in production: Glassdoor's ubiquitous dropdown menus and its similarly widespread employer photo carousels.

In addition to my React work, I also completed an interesting company infrastructure project. To streamline web app builds and patches, I moved static files out of the normal build and into their own repository. I created a Gulp task that lets web developers upload updated static files to an Amazon S3 bucket, which would service the web app's front end. I also wrote another task that generates a manifest of all static files, mapping each file's URI to a hash string used for cachebusting. In addition to JavaScript, this project involved a fair amount of Java; I needed to rewrite the back end to retrieve static resources from the new S3 bucket and use the manifest's cachebuster strings. This infrastructural move was crucial for productivity, as it allowed back-end engineers to iterate without worrying about unstable static files affecting their development and front-end engineers to patch static files without re-building everything.

Finally, I worked on various bug fixes and smaller stories for teams that needed an extra developer for a sprint. One mini-project that I found to be especially significant was a redesigning of the site's login/locked-content modules (i.e., "Sign up or sign in now to get access to all of Glassdoor"). This assignment was not difficult; I simply needed to modify several JSPs and Sass files to make the site use the new version of the modules. However, I found out after the changes went to production that overall user registration increased by about 5%.

Seeing my changes go live and actually impact the outside world was definitely the most rewarding aspect of my internship. It was something that I didn't experience with Coverity, whose products were used by other businesses and not publicly available. I also learned a great deal more about developing and testing in an agile environment, including working in 2-week sprints, using semantic versioning when updating dependencies, submitting pull requests to merge feature branches into release branches, and testing in a separate QA environment. And of course, I learned a tremendous amount of JavaScript, like coding in ES6 syntax and using Babel to compile it, using powerful tools like React and Redux, and testing with frameworks and libraries like Mocha and expect.js. Overall another valuable summer.

Coverity

May 2015 - August 2015

Skills: HTML/CSS, JavaScript & jQuery, Bootstrap, Java, some SQL

As a technical intern for Coverity's Test Advisor dev team (the Test Advisor, or TA, is a static analysis tool for Java, .NET, and C++ that determines how well a a code project's classes and methods are covered by its tests), I worked primarily on the Business Modeler component. The Business Modeler is a web application that allows a user to create a visual representation of his/her code project by mapping classes to physical containers, or modules, on the screen. These modules can be renamed, added, removed, and even nested, and the overall layout of the model a user creates can be geometrically re-adjusted by dragging and dropping modules. In addition, the user can access metrics like the percentage of classes mapped for a package in his/her project code tree and the percentage of classes covered by tests for a particular module. With a visual organized model, he/she gets a clearer picture of which functionalities of the project are covered, modified, etc.

Coverity had an old Business Modeler, but its UI was written in Flex and considered to be outdated. I worked on revamping the front-end with primarily JavaScript and jQuery and little dependence on third-party frameworks. In addition, I implemented API services in Java for the back-end server, which was written using the Dropwizard framework. One of the greatest challenges I had on the front-end was designing algorithms for geometrically restructuring a model's layout after the user moves a module to another location, which ultimately involved determining which modules were in the moved module's width or height span and multiplying their physical dimensions by an appropriate ratio. Another challenge was constantly thinking about the trade-offs between performance and space; for example, I didn't want to store all of a project's classes on the client side because we could be dealing with a project with thousands of classes. This meant that I had to create smart back-end services and front-end data structures that would still allow me to do things like keeping track of percentage of classes mapped.

Overall, I had a fantastic time at Coverity. I learned a great deal about JavaScript and the full chain of a web application, from connecting the front-end via AJAX to the server to making useful API services by persisting and retrieving data to and from the database. And I definitely couldn't have made as much out of this internship without the help and advice I received from my dev team mentors.