Skip to main content

Chapter 14: Dictionaries & String Handling

In Chapter 13, a variable held one value. That is fine for a single piece of information, but real data is never a single piece. A user has a name, an email, a plan, and a status. A server has a hostname, a port, a region, and an uptime. When those pieces belong together, you want to store them together — not scattered across four separate variables with no connection between them.

That is the problem a dictionary solves.


14.1 What Is a Dictionary?

A dictionary stores data as key-value pairs. Instead of accessing a value by position like a list, you access it by a name — its key.

user = {
"name": "Sarah",
"plan": "Pro",
"active": True,
"storage_gb": 250
}

How a Dictionary Is Structured

Key Value
┌─────────── : ──────────┐
│ "name" : "Sarah" │
│ "plan" : "Pro" │
│ "active" : True │
│ "storage_gb" : 250 │
└────────────────────────┘
Wrapped in { }

Keys are almost always strings. Values can be any type — string, integer, float, boolean, or even another dictionary.

Dictionary vs List

ListDictionary
Syntax[1, 2, 3]{"key": value}
Access byPosition indexKey name
Good forOrdered sequencesNamed, related fields
Examplescores[0]user["name"]

Accessing Values

Use the key in square brackets:

print(user["name"])
print(user["plan"])
print(user["storage_gb"])
Sarah
Pro
250

⚠ Common Mistake — KeyError Accessing a key that does not exist crashes immediately:

print(user["email"])
KeyError: 'email'

Use .get() when you are not sure a key exists. It returns None instead of crashing, or a default value if you provide one:

print(user.get("email")) # Returns None
print(user.get("email", "n/a")) # Returns "n/a"
None
n/a

In any pipeline that reads incoming data, a KeyError on an unexpected field can stop the entire run. .get() with a sensible default is the safer habit from day one.

Try It 14.1 — Create a dictionary with at least five keys describing a product: name, price, category, in_stock, and rating. Print three values using their keys. Then use .get() to safely access a key that does not exist.


14.2 Modifying Dictionaries

Dictionaries are not fixed. You can add new keys, update existing ones, and remove keys you no longer need.

Adding and Updating

Assigning to a key that already exists updates it. Assigning to a key that does not exist creates it.

server = {
"host": "db-01",
"port": 5432,
"status": "offline"
}

# Update an existing key
server["status"] = "online"

# Add a new key
server["region"] = "us-east-1"

print(server["status"])
print(server["region"])
online
us-east-1

Removing Keys

del removes a key permanently. .pop() removes a key and returns its value so you can use it:

server["backup_enabled"] = True

# Remove and capture the value
removed = server.pop("backup_enabled")
print(removed)

# Remove without capturing
del server["region"]

print(server)
True
{'host': 'db-01', 'port': 5432, 'status': 'online'}

Try It 14.2 — Start with a dictionary of four keys. Add two new keys, update one existing key, then remove one key using .pop() and print the value that was removed. Print the final dictionary.


14.3 Dictionary Methods

Python gives you three methods to see inside a dictionary:

config = {
"host": "localhost",
"port": 5432,
"db": "analytics"
}

print(config.keys()) # all keys
print(config.values()) # all values
print(config.items()) # all key-value pairs
dict_keys(['host', 'port', 'db'])
dict_values(['localhost', 5432, 'analytics'])
dict_items([('host', 'localhost'), ('port', 5432), ('db', 'analytics')])

These methods return view objects — live snapshots of the dictionary. You will use them with loops in Chapter 15, where iterating over .items() becomes one of the most common patterns in data work.

You can also check whether a key exists using in:

print("host" in config) # True
print("password" in config) # False
True
False

14.4 Nested Dictionaries

A dictionary's value can itself be a dictionary. This is how you represent records that have structure within structure.

product = {
"name": "Wireless Mouse",
"price": 29.99,
"specs": {
"weight_g": 85,
"dpi": 1600,
"wireless": True
}
}

Nested Dictionary Structure

product
├── "name" → "Wireless Mouse"
├── "price" → 29.99
└── "specs" → {
"weight_g" → 85
"dpi" → 1600
"wireless" → True
}

Access nested values by chaining keys:

print(product["name"])
print(product["specs"]["dpi"])
print(product["specs"]["wireless"])
Wireless Mouse
1600
True

You can update a nested value the same way:

product["specs"]["dpi"] = 3200
print(product["specs"]["dpi"])
3200

Try It 14.3 — Create a nested dictionary for a server. The top-level keys are name, location, and hardware. The hardware key holds another dictionary with cpu_cores, ram_gb, and disk_tb. Print the server name and each hardware value separately.


14.5 String Methods

Raw string data almost never arrives clean. It comes in with extra spaces, inconsistent casing, and fields jammed together. Python's built-in string methods fix all of this.

String Methods Reference

MethodWhat it doesExampleResult
.strip()Removes leading/trailing whitespace" hi ".strip()"hi"
.lstrip()Removes left whitespace only" hi ".lstrip()"hi "
.rstrip()Removes right whitespace only" hi ".rstrip()" hi"
.lower()All lowercase"HELLO".lower()"hello"
.upper()All uppercase"hello".upper()"HELLO"
.title()First letter of each word capitalised"john doe".title()"John Doe"
.replace(a, b)Replaces all occurrences of a with b"cat".replace("c","b")"bat"
.split(x)Splits into a list on delimiter x"a,b,c".split(",")['a','b','c']
.startswith(x)True if starts with x"hello".startswith("he")True
.endswith(x)True if ends with x"file.csv".endswith(".csv")True

Cleaning Whitespace

raw = " John Smith "
clean = raw.strip()
print(clean)
John Smith

Changing Case

status = "IN PROGRESS"
print(status.lower())

tag = "data pipeline"
print(tag.upper())

name = "alice johnson"
print(name.title())
in progress
DATA PIPELINE
Alice Johnson

.lower() is especially useful when comparing strings from different sources. "Karachi" and "karachi" are not equal in Python — but "Karachi".lower() == "karachi" is True.

Replacing and Splitting

log = "ERROR|2024-11-15|module:loader|msg:timeout"

# Replace a substring
updated = log.replace("ERROR", "WARNING")
print(updated)

# Split on a delimiter
parts = log.split("|")
print(parts)
print(parts[1]) # the date
WARNING|2024-11-15|module:loader|msg:timeout
['ERROR', '2024-11-15', 'module:loader', 'msg:timeout']
2024-11-15

.split() turns a flat string into a list — essential when reading CSV rows, parsing log files, or breaking apart structured codes.

Checking Content

filename = "sales_report_2024.csv"

print(filename.endswith(".csv"))
print(filename.startswith("sales"))
print("2024" in filename)
True
True
True

Try It 14.4 — You receive the string " ERROR | 2024-01-15 | disk full ". Strip the whitespace, split it on " | ", and print each part on its own line.


14.6 f-Strings

In Chapter 13 you built strings with + and str(). It worked, but it is hard to read:

name = "Alice"
score = 94
print("User: " + name + " | Score: " + str(score))

f-strings are cleaner. Prefix the string with f and embed any variable or expression inside {}:

name = "Alice"
score = 94
print(f"User: {name} | Score: {score}")
User: Alice | Score: 94

Any valid Python expression works inside {}:

done = 18
total = 24
print(f"Progress: {done}/{total} ({done/total*100:.1f}%)")
Progress: 18/24 (75.0%)

The :.1f is a format specifier — it rounds the float to one decimal place. The general pattern is :.Nf where N is the number of decimal places.

Format Specifier Reference

SpecifierMeaningExampleResult
:.2f2 decimal placesf"{3.14159:.2f}"3.14
:.0fNo decimal placesf"{99.7:.0f}"100
:,Thousands separatorf"{1000000:,}"1,000,000
:>10Right-align in 10 charsf"{'hi':>10}"" hi"

⚠ Common Mistake — Forgetting the f Without the f prefix, Python prints the braces literally:

name = "Alice"
print("Hello, {name}") # wrong — missing f
print(f"Hello, {name}") # correct
Hello, {name}
Hello, Alice

Try It 14.5 — Create a dictionary with a product name, price, and discount rate. Use an f-string to print: "Product: Widget | Price: $29.99 | After discount: $26.99" — calculate the discounted price inside the f-string.


14.7 String Slicing

Every character in a string has an index, starting at 0. You extract a portion using [start:end].

Index: 0 1 2 3 4 5 6 7 8
┌────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ P │ R │ J │ - │ 2 │ 0 │ 2 │ 4 │ - │ ...continues
└────┴────┴────┴────┴────┴────┴────┴────┴────┘
Negative: -16 -15 -14 -13 -12 -11 -10 -9 -8
code = "PRJ-2024-007"

print(code[0]) # First character
print(code[0:3]) # Characters 0, 1, 2 (stop before 3)
print(code[4:8]) # Year
print(code[-3:]) # Last 3 characters
print(code[:3]) # From beginning to index 3
print(code[4:]) # From index 4 to end
P
PRJ
2024
007
PRJ
2024-007

The rule: [start:end] — start is included, end is excluded. Leaving a side blank means "from the start" or "to the end."

Negative indices count from the right — [-1] is always the last character, [-3:] is always the last three.


14.8 Putting It Together

A record arrives with messy values — trailing spaces, inconsistent casing, a raw log line to parse. Here is how dictionary access and string methods work together to clean it:

# Raw record as it arrived
record = {
"username": " ayan.hussain ",
"email": "Ayan.Hussain@EXAMPLE.COM",
"plan": " PRO ",
"log": "2024-11-15|INFO|login_success"
}

# Clean each field
name = record["username"].strip().title()
email = record["email"].strip().lower()
plan = record["plan"].strip().title()

# Parse the log
log_parts = record["log"].split("|")
log_date = log_parts[0]
log_event = log_parts[2]

# Build output
print(f"User: {name}")
print(f"Email: {email}")
print(f"Plan: {plan}")
print(f"Last login: {log_date}")
print(f"Event: {log_event}")
User: Ayan.Hussain
Email: ayan.hussain@example.com
Plan: Pro
Last login: 2024-11-15
Event: login_success

The record came in dirty. It left clean. No new syntax was used here — just dictionary access and string methods, chained together.


Summary

A dictionary stores related data as key-value pairs accessed by name, not position. You create one with {}, access values with [key], and protect against missing keys with .get(). You can add, update, or remove keys at any time. Nested dictionaries hold a dictionary as a value, letting you represent structured records. .keys(), .values(), and .items() show you what is inside — you will use these with loops in the next chapter. On the string side, methods like .strip(), .lower(), .split(), and .replace() clean raw text. f-strings replace clunky concatenation with embedded expressions and format specifiers. Slicing extracts any portion of a string by index position.


Exercises

14.1 — Create a dictionary representing a user account with keys for username, email, plan, and storage_used_gb. Print each value using its key. Then use .get() to safely access "phone_number" with a default of "not provided".

14.2 — Start with this dictionary: {"status": "inactive", "region": "us-west"}. Add three new keys, update "status" to "active", and use .pop() to remove "region". Print the value that was popped, then print the final dictionary.

14.3 — The following string is a raw configuration line: "host=db-01 | port=5432 | db=analytics | ssl=true". Use .split() to break it apart, then use slicing on each part to extract only the values (everything after the =). Print each value on its own line.

14.4 — Create a nested dictionary for a cloud server. The top level has "name" and "specs". Inside "specs", store "cpu", "ram_gb", and "disk_tb". Use f-strings to print a formatted summary that includes all five values.

14.5 — You receive the string " daily_sales_report_2024_Q4.CSV ". Without calling .lower() more than once, write code that: strips whitespace, converts to lowercase, checks whether it ends with ".csv", and prints True or False.

14.6 — Given code = "US-LAX-2024-0047", use slicing only (no .split()) to extract and print: the country code ("US"), the city code ("LAX"), the year ("2024"), and the record number ("0047"). What does each slice tell you about the structure of the string?