When 200GB Disappears Overnight
Running an institutional repository on DSpace 9.1 with Ubuntu Server 24.04 should be peice of cake, until your monitoring alerts scream that disk usage has hit 98% on a 200GB partition. The server becomes sluggish, Tomcat stops responding properly, and users can’t access their research. This wasn’t a data influx or attack, it was a silent killer that many DSpace administrators overlook: unconstrained log file growth.
Our DSpace installation, configured with default logging settings, had accumulated 182GB of log files in the /dspace/log/ directory. Individual log files had ballooned to 1.5GB each, with dozens of such files consuming the entire disk allocation. Simultaneously, systemd journal files in /var/log/journal/ were creating 128MB files unchecked. The combination brought our production repository to its knees.
Your Application’s Default Logging is a Ticking Time Bomb
It’s natural to assume that an application’s default settings are safe for production. The surprising truth is that they often are not, especially when it comes to logging. Many applications, such as the DSpace repository software, are configured out-of-the-box with verbose logging levels like INFO. This setting diligently records every successful operation, from searches and submissions to downloads, generating a substantial amount of output even during normal activity. Vendors often ship products this way to simplify initial support and troubleshooting, sacrificing long-term operational stability for out-of-the-box diagnostics.
The critical oversight, however, is that these applications often lack any form of built-in, automated log rotation. The logging configuration focuses on capturing application events but completely ignores the operational reality of managing the files it creates.
“This wasn’t a data influx or attack; it was a silent killer that many DSpace administrators overlook: unconstrained log file growth.”
This combination of verbose logging and absent rotation creates a dangerous “disk space time-bomb.” The log files grow indefinitely, day after day, until they inevitably consume all available storage and cause a catastrophic service outage.
Root Cause: DSpace’s Verbose Logging Meets Absent Rotation
DSpace 9.1 uses Apache Log4j2 for logging, with a default configuration that prioritizes comprehensive debugging information over disk space management. The log4j2.xml configuration file (located in [dspace]/config/) ships with INFO level logging for all DSpace operations, which generates substantial output during normal repository activity—every submission, every search, every OAI-PMH harvest, and every bitstream download gets logged.
DSpace does not include built-in log rotation by default. Unlike system services that integrate with Linux’s logrotate utility, DSpace’s logging configuration focuses on application-level concerns without addressing operational log management. When left unattended, the dspace.log file grows indefinitely until it consumes available disk space.
Compounding the issue, systemd’s journal service was creating persistent journal files without size constraints, adding another layer of disk pressure. The default journal configuration on Ubuntu 24.04 allows unlimited growth, with each journal file reaching 128MB before rotation—except rotation doesn’t delete old files automatically.
Tracking the Space Hog
The investigation followed a systematic approach that every system administrator should have in their toolkit:
Step 1: Verify Disk Usage
df -h
Output : /dev/sda1 200G 196G 4G 98% /
Step 2: Identify Directory-Level Consumption
sudo du -sh /* | sort -hr
Key finding: /dspace 182G
Step 3: Drill Down into DSpace Directory
cd /dspace
sudo du -sh ./* | sort -hr
it Revealed that the size of log folder as ./log 182G
Step 4: Examine Log File Details
ls -lh /dspace/log/
Discovered multiple files:
dspace.log(1.5GB)dspace.log.2024-12-01(1.5GB)dspace.log.2024-11-30(1.5GB)- …and 30+ similar files
Step 5: Check System Journal Usage
sudo journalctl --disk-usage
and there were 4.2GB of journal files
Emergency Cleanup
Clearing DSpace Logs
# Navigate to DSpace log directory
cd /dspace/log/
# Keep only the last 7 days of logs manually
find . -name "dspace.log.*" -mtime +7 -delete
# Truncate current log file without restarting DSpace
sudo truncate -s 0 dspace.log
Cleaning Systemd Journal
# Remove journals older than 7 days
sudo journalctl --vacuum-time=7d
# Limit total journal size to 500MB
sudo journalctl --vacuum-size=500M
# Rotate active journals
sudo journalctl --rotate
This Freed 178GB immediately, restoring server responsiveness.
Long-Term Solution: Implementing Logrotate for DSpace
Manual cleanup solves the immediate crisis but doesn’t prevent recurrence. The robust solution integrates DSpace logs with Linux’s logrotate service.
Creating DSpace Logrotate Configuration
Create a dedicated configuration file:
sudo nano /etc/logrotate.d/dspace
Add this configuration:
text/dspace/log/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 dspace dspace
size 100M
sharedscripts
postrotate
# Signal DSpace to reopen log files if needed
# This is optional but recommended for zero-downtime log rotation
# kill -USR1 $(cat /var/lib/tomcat9/tomcat.pid) 2>/dev/null || true
endscript
}
Configuration Parameters Explained
| Parameter | Purpose | Value Rationale |
|---|---|---|
daily | Rotation frequency | Balances log file size with retention granularity |
rotate 7 | Keep 7 archives | Provides one week of historical data for troubleshooting |
compress | Gzip old logs | Reduces disk usage by 80-90% for text logs |
delaycompress | Delay compression | Keeps most recent rotated file uncompressed for immediate access |
size 100M | Size threshold | Prevents individual logs from growing too large before rotation |
create 0640 dspace dspace | File permissions | Maintains proper ownership for DSpace’s Tomcat user |
Testing the Configuration
# Test configuration without executing rotations
sudo logrotate -d /etc/logrotate.d/dspace
# Force immediate rotation for testing
sudo logrotate -f /etc/logrotate.d/dspace
# Verify rotation occurred
ls -lh /dspace/log/
Systemd Journal Configuration: Preventing Future Accumulation
Edit the journald configuration to enforce permanent limits:
sudo nano /etc/systemd/journald.conf
Uncomment and modify these lines:
textSystemMaxUse=500M
SystemMaxFileSize=50M
MaxFileSec=7day
Apply changes:
sudo systemctl restart systemd-journald
DSpace-Specific Logging Optimizations
Beyond logrotate, optimize DSpace’s own logging behavior:
Adjust Log Levels
Edit [dspace]/config/log4j2.xml:
xml<!-- Change from INFO to WARN for stable production systems -->
<Logger name="org.dspace" level="WARN" additivity="false">
<AppenderRef ref="A1"/>
</Logger>
<!-- Keep INFO for critical subsystems if needed -->
<Logger name="org.dspace.content" level="INFO" additivity="false">
<AppenderRef ref="A1"/>
</Logger>
Production systems with stable operations don’t need INFO level logging, which records every successful operation. WARN level captures only warnings and errors, reducing log volume by 70-90%.
Implement Log4j2 RollingFileAppender
For more sophisticated control, modify log4j2.xml to use built-in rotation:
xml<RollingFile name="A1" fileName="${log.dir}/dspace.log"
filePattern="${log.dir}/dspace.log.%d{yyyy-MM-dd}-%i.log.gz">
<PatternLayout pattern="%d %-5p [%c{1}] %m%n"/>
<Policies>
<SizeBasedTriggeringPolicy size="100 MB"/>
<TimeBasedTriggeringPolicy/>
</Policies>
<DefaultRolloverStrategy max="7"/>
</RollingFile>
This configuration provides application-level rotation that works independently of system logrotate.
Monitoring and Alerting: Early Warning System
Implement proactive monitoring to prevent future incidents:
Disk Usage Monitoring Script
Create /usr/local/bin/check_dspace_logs.sh:
#!/bin/bash
LOG_SIZE=$(du -s /dspace/log/ | cut -f1)
MAX_SIZE=10485760 # 10GB in KB
if [ $LOG_SIZE -gt $MAX_SIZE ]; then
echo "WARNING: DSpace logs at $(du -sh /dspace/log/) on $(hostname)" | \
mail -s "DSpace Log Alert" [email protected]
fi
Add to crontab:
0 9 * * * /usr/local/bin/check_dspace_logs.sh
Logrotate Status Verification
# Check last logrotate execution
cat /var/lib/logrotate/status | grep dspace
# Monitor logrotate errors
grep logrotate /var/log/syslog
Lessons Learned and Best Practices
1. Log Management Is Infrastructure, Not Afterthought
DSpace administrators must treat log management as a core infrastructure component, not an optional optimization. The default DSpace configuration assumes manual log oversight, which is impractical for production systems.
2. Size-Based Rotation Trumps Time-Based Rotation
While our solution uses daily rotation, the size 100M parameter ensures that unexpected activity spikes can’t create multi-gigabyte files before the next scheduled rotation. This hybrid approach balances predictability with safety.
3. Separate Application and System Log Policies
Systemd journals and application logs serve different purposes. Journals capture system-level events; DSpace logs capture repository activity. Maintain separate retention policies—7 days for DSpace logs provides adequate troubleshooting history without excessive disk usage.
4. Compression Is Non-Negotiable
Text logs compress at ratios exceeding 80%. The delaycompress parameter balances immediate access to recent logs with long-term storage efficiency. For a busy repository, this can mean the difference between 180GB and 18GB of log storage.
5. Test Configuration Changes
Always use logrotate -d (debug mode) to validate configuration syntax before deployment. A syntax error in logrotate configuration can prevent all system log rotation, creating a cascading failure.
6. Monitor Log Levels in Production
After debugging sessions, always revert log4j2.xml from DEBUG to INFO or WARN. DEBUG level can generate gigabytes of data per hour on active repositories.
For quick implementation, here’s the complete command sequence:
# Emergency cleanup
sudo find /dspace/log/ -name "dspace.log.*" -mtime +7 -delete
sudo truncate -s 0 /dspace/log/dspace.log
sudo journalctl --vacuum-time=7d --vacuum-size=500M
# Install logrotate (usually pre-installed on Ubuntu)
sudo apt update && sudo apt install -y logrotate
# Create DSpace logrotate configuration
sudo tee /etc/logrotate.d/dspace > /dev/null <<'EOF'
/dspace/log/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 dspace dspace
size 100M
}
EOF
# Configure systemd journal limits
sudo sed -i 's/#SystemMaxUse=/SystemMaxUse=500M/' /etc/systemd/journald.conf
sudo sed -i 's/#SystemMaxFileSize=/SystemMaxFileSize=50M/' /etc/systemd/journald.conf
sudo systemctl restart systemd-journald
# Test logrotate configuration
sudo logrotate -d /etc/logrotate.d/dspace
From Crisis to Resilience
What began as a production outage became an opportunity to implement enterprise-grade log management. The solution transformed our DSpace installation from a disk space time-bomb into a maintainable system with predictable storage requirements. By integrating DSpace with Linux’s native logrotate utility and configuring sensible systemd journal limits, we’ve established a sustainable logging strategy that protects repository availability while preserving necessary diagnostic information.
For DSpace administrators, the key takeaway is proactive log management. Don’t wait for a crisis—implement these controls during initial deployment. The 30 minutes spent configuring logrotate will save countless hours of emergency cleanup and prevent service disruptions that undermine trust in your institutional repository.
Final disk usage after implementation: /dspace/log reduced from 182GB to 2.3GB—a 98.7% reduction, with automated rotation ensuring it never exceeds 5GB again.
Discover more from Rupinder Singh
Subscribe to get the latest posts sent to your email.


