Module 2: APK Structure, Reverse Engineering & Code Review

📦 Analysis Module

APK Structure, Reverse Engineering Awareness & Code Review

Understanding APK Format, Static Analysis Concepts & Secure Code Review Principles

Master Android Package (APK) file structure, AndroidManifest.xml analysis, reverse engineering awareness concepts, static code analysis methodology, secure code review principles. Understand critical vulnerabilities: hardcoded secrets, API key exposure, insecure implementations. Enterprise application security governance and secure SDLC.

APK File Structure Overview

Understanding Android Package format and components

📦 APK Fundamentals

APK (Android Package) is compressed ZIP archive containing all application resources, code, assets, configuration. Understanding APK structure critical for security analysis: identifies exposed data, reveals application architecture, exposes configuration vulnerabilities.

APK Structure Breakdown

MyApp.apk
├── AndroidManifest.xml (app configuration)
├── classes.dex (compiled app code)
├── classes2.dex (additional code, if present)
├── resources.arsc (compiled resources)
├── res/ (resource files)
│ ├── drawable/ (images, icons)
│ ├── layout/ (UI layouts)
│ ├── values/ (strings, colors)
│ └── ...
├── assets/ (raw assets)
├── lib/ (native libraries .so files)
├── META-INF/ (signatures, certificates)
│ ├── MANIFEST.MF
│ ├── CERT.SF
│ └── CERT.RSA
└── unknown (potentially malicious files)

AndroidManifest.xml - Configuration Core

Most critical APK file: XML configuration declaring application structure, permissions, components. Manifest defines:

Package Name: Unique identifier (com.example.app). Identifies app in Google Play Store.
Permissions: Required permissions application requests. Security-critical: declares app intent to access sensitive resources.
Activities: User-visible screens. Each activity declared with intent filters defining how it receives intents.
Services: Background processes. Services can be exported allowing external components communication.
Broadcast Receivers: Listen to system/application broadcasts. Exported receivers accessible by other apps.
Content Providers: Share data with other apps. Can expose sensitive data if not properly protected.
Intent Filters: Define how components respond to intents. Misconfigured filters enable intent hijacking attacks.
Uses-Feature: Hardware features required. Optional vs required distinction affects app availability.

                    🔑 Manifest Insight: Manifest unencrypted and readable within APK. Developers must
                    avoid storing sensitive information. Exported components accessible by other apps - must verify all
                    exported components have legitimate use cases.
                

💾 DEX File & Compiled Code

DEX (Dalvik Executable) format contains compiled application code. Android Runtime (ART) executes DEX bytecode converting to native machine code. DEX files decompilable - source code can be recovered.

DEX File Characteristics

Compiled Bytecode: Source code compiled to bytecode, not binary. Decompilers convert bytecode back to readable code.
Multiple DEX: Large apps split code across multiple DEX files (classes.dex, classes2.dex, etc.). Each DEX <= 65,536 method limit.
Obfuscation: Developers use obfuscation tools renaming variables/methods to meaningless names, complicating reverse engineering. Obfuscation not encryption - determined attackers can reverse it.
String Resources: Hardcoded strings often visible: API keys, URLs, error messages, comments. String hardcoding major vulnerability.

                    ⚠️ Code Exposure Risk: DEX decompilable to source code approximation. Assume all
                    code exposed. Never hardcode secrets (API keys, tokens, passwords). Use secure storage for sensitive
                    data.
                

🎨 Resources & Assets

APK includes application resources: images, layouts, strings, colors, configurations. Resources compressed in resources.arsc (binary resource archive) and uncompressed in res/ directory.

Resource Types & Security

Layout Files: UI definitions. Can reveal application structure, intended functionality.
String Resources: Text content including credentials, API endpoints, error messages. Analyzing strings reveals sensitive information.
Configuration Files: Application configuration stored as XML. May contain API endpoints, feature flags, sensitive URLs.
Images/Icons: Usually non-sensitive but can reveal branding, screenshots, sensitive visuals.
Assets Directory: Raw assets (databases, JSON config, certificates). Often contain configuration data, encryption keys, hardcoded secrets.

                    📋 Resource Review: Never store sensitive data in resources. Assume all resources
                    readable. Configuration should externalized from APK. API endpoints, certificates, encryption keys
                    must not be hardcoded.
                

🔒 Digital Signatures & Certificates

APK signed with private key in META-INF directory. Signature verification ensures app authenticity and integrity. Certificate information in META-INF/CERT.RSA.

Signature Components

Certificate: Contains developer public key and identity information (developer name, organization).
Signature Verification: Android verifies APK signature before installation. Prevents app tampering/modification.
Certificate Pinning: Advanced security: app verifies server certificates against pinned certificate, preventing man-in-the-middle attacks.
Key Management: Private key compromise enables attacker signing malicious APK impersonating legitimate app.

                    🔐 Security Consideration: APK signatures prevent app modification but don't
                    prevent decompilation. Signature verifies authenticity, not confidentiality. Keep private keys
                    secure - compromise enables impersonation.
                

Reverse Engineering Awareness

Understanding static analysis and code recovery concepts

🔍 Static Analysis Fundamentals

Static analysis examines application code without executing it. Analysts extract APK, examine resources, analyze code structure, identify vulnerabilities. Understanding static analysis concepts critical for identifying attack vectors.

Static Analysis Process

APK Extraction: APK is ZIP file - extract to access contents. Examine directory structure, identify included files.
Manifest Analysis: Parse AndroidManifest.xml to understand app structure, permissions, components, intent filters.
Resource Inspection: Examine resources, layouts, strings, configuration files for sensitive data, hardcoded secrets.
Code Review: Analyze DEX files examining application logic, identifying vulnerabilities, cryptographic implementations, data handling.
Library Analysis: Identify third-party libraries, check for known vulnerabilities, assess data collection practices.
Vulnerability Identification: Identify security flaws: insecure data storage, hardcoded secrets, improper cryptography, SQL injection, insecure deserialization.

                    🎯 Analysis Mindset: Approach code review as adversary: assume malicious use,
                    identify misconfigurations, find data exposure, test security controls. Think like attacker - what
                    vulnerabilities enable compromise?
                

🔐 Code Recovery & Readability

DEX bytecode converts back to source code approximation through decompilation. While not identical to original source, recovered code highly readable revealing application logic, vulnerabilities.

Decompiled Code Characteristics

Variable Names Lost: Obfuscation renames variables to meaningless names (a, b, c). Makes code harder to understand but not impossible.
Comments Removed: Comments stripped during compilation. No developer explanations in recovered code.
Flow Reconstruction: Control flow reconstructed from bytecode. Complex logic may be harder to follow.
Constant Strings: String literals visible in code. Hardcoded API keys, URLs, secrets completely visible.
Logic Exposed: Application logic completely exposed. Authentication checks, encryption implementation, security mechanisms visible.

                    ⚠️ Code Transparency: Assume all application code readable by determined analyst.
                    Security cannot depend on code obscurity. Implement security through proper design: never hardcode
                    secrets, externalize configuration, use secure storage.
                

🚀 Obfuscation & Protection Awareness

Obfuscation makes code harder to understand but doesn't prevent reverse engineering. Developers use obfuscation to complicate analysis, not to secure application.

Obfuscation Techniques

Name Mangling: Rename classes, methods, variables to meaningless single letters (a(), b, c). Increases cognitive load but determined analyst can reverse.
Control Flow: Insert unnecessary branches, loops making code harder to follow. Add dead code paths irrelevant to functionality.
String Encryption: Encrypt hardcoded strings decrypting at runtime. Analyst must trace encryption/decryption to recover strings.
Code Injection: Add irrelevant code paths confusing analysis. Inject defensive code detecting debugging/tampering.
Reflection Usage: Use reflection to invoke methods dynamically. Makes static analysis harder since method names not visible in code.

Obfuscation Limitations

Obfuscation provides time-delay security: slows determined attacker but doesn't prevent compromise. Obfuscated code still decompilable with effort. Runtime behavior observable despite obfuscation. Cryptographic algorithms identifiable regardless of obfuscation.

                    💡 Obfuscation Reality: Obfuscation tool (ProGuard, R8) standard practice for
                    legitimate app protection but shouldn't be relied upon as sole security mechanism. Security must be
                    layered: secure design + obfuscation + runtime protection.
                

Secure Code Review Principles

Identifying vulnerabilities and security weaknesses

🔑 Hardcoded Secrets Detection

Common critical vulnerability: sensitive data hardcoded in application code. Developers accidentally include passwords, API keys, tokens, certificates directly in source or configuration files.

Types of Hardcoded Secrets

🔐

API Keys & Tokens

Third-party service credentials hardcoded. Attacker using stolen keys impersonating legitimate app, accessing services, incurring costs.

🔑

Passwords & Credentials

Database passwords, service credentials hardcoded. Attacker accessing backend systems using exposed credentials.

🛡️

Encryption Keys

Cryptographic keys hardcoded for data encryption. Encryption only security theater - attacker has encryption key.

📜

Certificates

SSL/TLS certificates or private certificates hardcoded. Compromise enables impersonation or man-in-the-middle attacks.

🌐

Backend URLs

Backend API endpoints hardcoded. Reveals infrastructure, enables direct backend attacks.

👤

Test Accounts

Test credentials forgotten in production. Attacker using test accounts with elevated privileges.

Identification Approach

Code reviewers search for strings: "password", "key", "secret", "token", "api_key", "credential". Examine configuration files, strings.xml, JSON/XML resources for sensitive data. Check for base64-encoded data which often indicates hardcoded secrets. Look for suspicious comments revealing credentials.

                    🚨 Hardcoded Secret Risk: Treated as critical vulnerability - automatic app
                    rejection in security review. Requires immediate remediation. Must use secure storage (Android
                    Keystore, encrypted SharedPreferences) for all sensitive data. Never hardcode - externalize
                    configuration.
                

⚠️ API Key Exposure Risks

API keys enable access to third-party services. Hardcoded API keys in APK represent critical vulnerability: attacker extracts key, makes unauthorized API calls, incurs costs, exfiltrates data.

API Key Attack Scenarios

Service Abuse: Attacker uses stolen API key making unlimited requests, exhausting quota, incurring charges on app owner.
Data Access: API key enabling access to user data in backend service. Attacker retrieving customer information, personal data.
Service Disruption: Attacker using key deleting/modifying data, disrupting legitimate app functionality.
Rate Limit Bypass: Per-app rate limits bypassed if attacker uses legitimate app's API key making unrestricted requests.
Geographic Spoofing: Some services depend on API key origin for geographic validation. Hardcoded key enables spoofing origin restrictions.

Secure API Key Management

Backend Proxy: Never expose API keys in APK. App calls backend endpoint, backend uses API key server-side. Backend rate-limited preventing key exhaustion.
Rotation: API keys rotatable server-side without app update. Compromised key revoked immediately.
Scope Limiting: Use API keys with restricted scopes/permissions. Even if compromised, attacker limited to specific operations.
Monitoring: Monitor API usage detecting suspicious patterns. High request volume from single IP indicates compromise.
Certificate Pinning: Pin API service certificates preventing man-in-the-middle attacks intercepting API key usage.

                    🔒 API Key Protection: Fundamental principle: never expose credentials in client
                    apps. All sensitive service calls must proxy through backend. Backend validates requests, adds rate
                    limiting, implements monitoring. Client app never possesses API credentials.
                

🛡️ Common Vulnerability Patterns

Security reviewers identify repeating vulnerability patterns indicating systemic security weaknesses. Understanding patterns helps proactive identification and remediation.

Critical Vulnerabilities to Identify

Insecure Data Storage: Sensitive data stored plaintext in SharedPreferences, files, database without encryption. Assume device may be physically compromised.
SQL Injection: User input directly concatenated into SQL queries. Attacker injecting SQL commands executing arbitrary database operations.
Insecure Deserialization: App deserializing untrusted data without validation. Attacker crafting malicious serialized objects executing arbitrary code.
Weak Cryptography: Using deprecated algorithms (DES), hardcoded keys, poor randomness, incorrect IV usage. Encryption cryptographically broken.
Insecure Communication: Sensitive data transmitted without encryption (HTTP instead of HTTPS). Network traffic intercepted revealing data.
Intent Injection: App receiving intents without validation of sender/data. Attacker sending crafted intents causing unexpected behavior.
Exported Components: Components exported unnecessarily (Activities, Services, Receivers). Other apps interacting with exported components causing security issues.
Path Traversal: App accessing files with user-supplied path without validation. Attacker accessing arbitrary files on device.
Missing Validation: Input/output not validated/sanitized. Malicious input causing crashes, data corruption, security issues.
Overpermissioning: App requesting excessive permissions. Infected app using unnecessary permissions accessing sensitive resources.

                    🔍 Vulnerability Identification: Secure code review systematic process: (1)
                    Understand intended functionality, (2) Analyze data flows - where data enters/exits, (3) Identify
                    trust boundaries, (4) Find vulnerabilities at boundaries, (5) Assess impact/severity.
                

📋 Code Review Checklist Approach

Effective secure code review follows structured checklist ensuring consistent evaluation. Checklist covers OWASP top vulnerabilities and common Android security issues.

Code Review Focus Areas

Authentication: How does app authenticate users? Credentials stored securely? Session tokens managed properly? Password policies enforced?
Authorization: How does app control access to resources? Are access controls properly implemented? Can users access unauthorized data?
Data Protection: How is sensitive data protected? Encrypted at rest? Transmitted securely? Data classification evident?
Cryptography: What cryptographic algorithms used? Keys managed securely? Randomness properly seeded? Encryption/decryption correct?
Input Validation: Are user inputs validated? Any injection vulnerabilities? File paths, URLs validated? SQL injection possible?
Error Handling: Do error messages leak sensitive information? Stack traces exposed? Error handling consistent?
Logging: What data is logged? Sensitive information logged? Logs accessible by other apps? Log retention policies?
Dependencies: Third-party libraries identified? Known vulnerabilities checked? Dependency versions current?
Configuration: Hardcoded configuration values? Debug flags enabled in production? Sensitive configuration exposed?
Components: Are components properly secured? Necessary components exported? Intent filters secure? Deep links validated?

                    ✅ Review Process: Systematic review using checklist ensures thoroughness. Document
                    findings with severity levels: critical (immediate fix required), high (high impact, timely fix),
                    medium (moderate risk), low (nice to fix). Provide remediation guidance with code examples.
                

Enterprise Application Security

Secure SDLC and code governance

🏢 Secure Software Development Lifecycle (SDLC)

Enterprise organizations implement Secure SDLC: development process integrating security from inception through deployment. Security not added as afterthought but embedded in development lifecycle.

Secure SDLC Phases

Planning & Requirements: Security requirements defined at project start. Threat modeling identifying potential attacks. Security acceptance criteria established.
Architecture & Design: Secure architecture designed avoiding common pitfalls. Security patterns applied (secure authentication, authorization, data protection). Threat modeling guides design decisions.
Development: Developers trained on secure coding practices. Code follows security guidelines, design patterns. Security code review gates prevent vulnerable code merging.
Testing & QA: Security testing integral to QA: static analysis (automated code scanning), dynamic analysis (testing running app), penetration testing (simulated attacks).
Deployment: Security verification before production deployment. Code signing, configuration review, secrets management verified.
Maintenance & Monitoring: Security monitoring post-deployment. Vulnerability patches applied promptly. Security incidents investigated, remediated.

                    🛡️ Lifecycle Benefit: Secure SDLC significantly reduces vulnerabilities: catches
                    issues early (requirements/design) reducing expensive later fixes, builds security mindset among
                    developers, enables proactive vulnerability identification.
                

📊 Code Review Governance

Enterprise enforces code review governance: mandatory peer review before code deployment, security review gates, documentation requirements, approval workflows.

Governance Components

Peer Code Review: All code changes reviewed by qualified developers before merge. Reviewers check logic, security, code quality, adherence to standards.
Security Review: Dedicated security reviewers check security-sensitive changes. High-risk changes (authentication, data handling, cryptography) require explicit security approval.
Automated Scanning: Static analysis tools automatically scan code identifying common vulnerabilities. Build pipeline fails if critical vulnerabilities detected.
SCA (Software Composition Analysis): Tools scan dependencies identifying vulnerable libraries. Prevents inclusion of known-vulnerable components.
Documentation: Code changes documented: rationale, security implications, testing performed. Enables effective review and future maintenance.
Approval Workflow: Multi-level approvals for sensitive changes: code reviewer + security reviewer + tech lead. Clear approval requirements prevent unauthorized changes.
Audit Trail: All changes tracked: who changed what, when, why, who approved. Enables auditing, incident investigation, compliance verification.

                    ✔️ Governance Benefit: Formal code review process: distributes security knowledge
                    (all developers learn from reviews), prevents personal lapses (multiple eyes catch mistakes), builds
                    security culture, provides audit trail for compliance.
                

🔐 Security Controls in Enterprise Apps

Enterprise applications implement defense-in-depth: multiple overlapping security controls preventing single vulnerability causing complete compromise.

Enterprise Security Control Examples

API Rate Limiting: Limits requests per user/IP/API key preventing abuse, DDoS attacks, enumeration attacks.
Request Signing: App signs requests proving authenticity preventing impersonation of legitimate app.
Certificate Pinning: App pins backend certificates preventing MITM attacks even if device's CA store compromised.
Request Validation: Backend validates all requests: correct format, authenticated user, authorized access, legitimate values.
Encryption: All sensitive data encrypted at rest (in storage) and in transit (HTTPS/TLS) preventing data exposure.
Audit Logging: All significant operations logged: authentication attempts, authorization decisions, data access. Logs enable incident investigation.
Incident Response: Procedures for security incidents: detection, containment, investigation, remediation, communication.
Vulnerability Management: Vulnerability scanning tools, penetration testing, bug bounty programs identify issues. Systematic patching process fixes vulnerabilities.

                    🎯 Defense-in-Depth Approach: Single security control often insufficient. Layered
                    controls: (1) Prevent vulnerabilities through secure design, (2) Detect exploitation attempts
                    through monitoring, (3) Contain incidents through controls, (4) Recover through incident response.
                    Even if one control fails, others activate.
                

External Learning References

Official Android security and code review documentation

📚 Official Android Developer Documentation

Android App Bundle & APK Distribution - https://developer.android.com/guide/app-bundle - Detailed APK structure, app bundle format, distribution best practices. View App Bundle Guide
Manifest File Overview - https://developer.android.com/guide/topics/manifest - Comprehensive AndroidManifest.xml documentation, all manifest elements, configuration options. Explore Manifest Docs
App Signing & Security - https://developer.android.com/about/versions/pie/android-9.0-changes - APK signing requirements, signature verification, security best practices. Learn Signing Requirements
Security Best Practices - https://developer.android.com/training/best-practices/security - Secure coding practices, data protection, cryptography guidelines, secure storage. Read Security Guide
Code Obfuscation with R8 - https://developer.android.com/studio/build/shrink-code - R8 configuration, obfuscation techniques, code shrinking best practices. Understand R8 Obfuscation
Secure Storage & Encryption - https://developer.android.com/training/articles/keystore - Android Keystore system, encryption implementation, secure key management. Learn Secure Storage
OWASP Mobile Top 10 - https://owasp.org/www-project-mobile-top-10/ - Industry-recognized mobile app risks, testing methodology, remediation guidance. Review OWASP Top 10
CWE: Common Weakness Enumeration - https://cwe.mitre.org/ - Official vulnerability definitions, weakness descriptions, applicable to mobile development. Access CWE Database