📁 Machine Coding: File System Low-Level Design
📝 Overview
An in-memory file system mimics standard operating system folder navigation, file creation, and movement, minus the disk. Because everything lives in RAM, this challenge allows you to focus purely on data structures, tree traversals, and object-oriented operations without worrying about persistence, caching, or I/O performance.
Why This Challenge?
- Tree Data Structures: Tests your ability to model and traverse hierarchical structures efficiently.
- Design Patterns: Evaluates your understanding of the Composite design pattern, uniting disparate objects under a shared interface.
- String Parsing: Requires careful path resolution algorithms to navigate Unix-style string paths.
🏭 The Scenario & Requirements
😡 The Problem (The Villain)
Navigating a file system hierarchy typically involves messy conditional logic if files and folders are treated as completely separate concepts. When a system needs to scale to tens of thousands of deeply nested entries, poorly optimized path resolution and rigid data models lead to massive performance bottlenecks and unmaintainable code.
🦸 The System (The Hero)
A clean, object-oriented in-memory file system utilizing the Composite pattern. By establishing a shared abstraction for both files and folders, the system can gracefully execute recursive path resolution, seamless content retrieval, and modular file operations.
📜 Requirements & Constraints
- (Functional): Must use a single root Unix-style hierarchy (e.g.,
/home/user/file.txt). - (Functional): Support core operations: create, delete, list contents, move, rename, and navigate paths to get contents.
- (Functional): Files must store actual string content.
- (Technical): Scale to support tens of thousands of entries while remaining responsive with deep folder hierarchies.
- (Technical): Must throw specific exceptions so callers can handle different failure modes appropriately (e.g., missing parent folder, attempting to delete the root).
- (Out of Scope): Permissions, timestamps, and symbolic links are explicitly excluded.
🏗️ Design & Architecture
🧠 Thinking Process
To structure this, we need to map the nouns to our core entities.
* File and Folder: Both share fundamental traits (they have a name and live inside a path), but they differ in behavior (files hold string content; folders hold collections of other entities).
* FileSystemEntry: A shared abstraction (interface/base class) that unites files and folders so a parent folder can hold a collection of generic entries.
* FileSystem: The orchestrator. It holds the root folder and manages path resolution helpers to route commands (like create, move, delete) to the correct deep node.
🧩 Class Diagram
(The Object-Oriented Blueprint. Who owns what?)
classDiagram
direction TB
class FileSystem {
-Folder root
+createFile(path, content)
+createFolder(path)
+delete(path)
+listContents(path)
-resolvePath(path)
}
class FileSystemEntry {
<<abstract>>
-String name
+getName()
+isDirectory() boolean
}
class File {
-String content
+getContent()
+setContent()
}
class Folder {
-Map~String, FileSystemEntry~ children
+addEntry(FileSystemEntry)
+removeEntry(String name)
+getChild(String name)
}
FileSystem o-- Folder : root
FileSystemEntry <|-- File
FileSystemEntry <|-- Folder
Folder o-- FileSystemEntry : contains
⚙️ Design Patterns Applied
- Composite Pattern:
FolderandFileboth inherit fromFileSystemEntry. This allows aFolderto maintain a map ofFileSystemEntryobjects, treating individual files and nested sub-folders uniformly when calculating sizes, traversing paths, or deleting nodes. - Facade Pattern:
FileSystemacts as the single entry point. The client passes Unix paths as strings, and the orchestrator handles string splitting and path resolution without exposing the raw node traversal logic to the caller.
💻 Solution Implementation
The Code Outline
from typing import Dict, List
class FileSystemEntry:
def __init__(self, name: str):
self.name = name
def is_directory(self) -> bool:
pass
class File(FileSystemEntry):
def __init__(self, name: str, content: str = ""):
super().__init__(name)
self.content = content
def is_directory(self) -> bool:
return False
class Folder(FileSystemEntry):
def __init__(self, name: str):
super().__init__(name)
self.children: Dict[str, FileSystemEntry] = {}
def is_directory(self) -> bool:
return True
def add_entry(self, entry: FileSystemEntry):
if entry.name in self.children:
raise ValueError(f"Entry {entry.name} already exists.")
self.children[entry.name] = entry
def get_child(self, name: str) -> FileSystemEntry:
return self.children.get(name)
class FileSystem:
def __init__(self):
self.root = Folder("")
def _resolve_parent_folder(self, path: str) -> Folder:
# Helper to navigate down the tree
if path == "/" or not path:
return self.root
parts = [p for p in path.split("/") if p]
current = self.root
for part in parts[:-1]:
current = current.get_child(part)
if not current or not current.is_directory():
raise ValueError(f"Path not found or not a directory.")
return current
def create_file(self, path: str, content: str):
parts = [p for p in path.split("/") if p]
if not parts:
raise ValueError("Invalid file path")
parent_folder = self._resolve_parent_folder(path)
file_name = parts[-1]
parent_folder.add_entry(File(file_name, content))
def get_content(self, path: str) -> str:
parts = [p for p in path.split("/") if p]
parent_folder = self._resolve_parent_folder(path)
file_node = parent_folder.get_child(parts[-1])
if not file_node or file_node.is_directory():
raise ValueError("File not found.")
return file_node.content
🔬 Why This Works (Evaluation)
The true complexity of an in-memory file system lies in path resolution. Instead of duplicating traversal logic across create, delete, and move operations, the orchestrator abstracts tree traversal into a private _resolve_parent_folder helper. It elegantly utilizes string splitting on the / delimiter to drill down the Folder hierarchy in \(O(D)\) time (where \(D\) is the directory depth). Explicit type-checking (is_directory()) prevents files from being treated as folders, cleanly throwing exceptions as requested by the requirements.
⚖️ Trade-offs & Limitations
| Decision | Pros | Cons / Limitations |
|---|---|---|
| In-Memory Hash Maps for Children | \(O(1)\) lookups for entries within a specific directory layer. Highly responsive. | Memory footprint scales rapidly. Storing large string contents exclusively in RAM could trigger Out Of Memory (OOM) errors at high scale. |
| String Path Parsing per Call | Keeps the API strictly stateless; callers don't need to track directory handles. | \(O(D)\) traversal overhead on every single operation. |
🎤 Interview Toolkit
- Concurrency Probe: How would you make this file system thread-safe?
Implementing a global lock on the
FileSystemensures correctness but severely limits throughput. A better approach utilizes Read-Write Locks at theFolderlevel (fine-grained locking). Moving a file requires write-locking the source and destination directories, while readers can simultaneously traverse unrelated folders. - Extensibility: How would you add search functionality?
To search for files by name, you would implement a Depth-First Search (DFS) or Breadth-First Search (BFS) algorithm starting from the root folder (or a specific path), iterating over all
childrenrecursively and returning paths that match the target string.
🔗 Related Challenges
- Amazon Locker LLD — Another assignment focusing heavily on boundaries between an orchestrator and physical (or in this case, logical) compartments/folders.
- Connect Four LLD — Shares a focus on maintaining strict internal state boundaries and separating validation logic from core entity representations.