SSD Linux Tweaks2012-12-05 (updated: 2015-02-23) by Philip
Tags: Linux, ssd
Solid State Drives have already become one of the best computer upgrades money can buy. Most current OSes add some functionality to accommodate SSDs (like TRIM), however, there is much left for the user to manually enable and tweak. This article explores all the important aspects of tuning Linux for best possible SSD performance by enabling TRIM support, choosing the right filesystem type, aligning partitions, reducing unnecessary small disk writes for longer SSD endurance, and more.
Proper alignment of partitions is very important for SSDs, as it avoids excessive read-modify-write cycles. Older partitioning tools were optimized for hard disk drives (with disks, cylinders and heads) rather than SSDs with different NVM page and block sizes. Newer partitioning tools typically align partitions to start at 1MB marks, This covers all common SSD page and block size scenarios, as it is divisible b 1MB, 512KB, 128KB, 4KB and 512 bytes. Most recent Linux distributions already take into account SSDs and align partitions correctly, however, you may want to check to make sure. We will assume that your SSD is at /dev/sda in the following examples.
Check your partitons on /dev/sda with the command:
fdisk -lu /dev/sda
Note that there are 2048 clusters in a Megabyte. For each partition in the output of this command, look at the "Start" column and make sure it is divisible by 2048 (2048 clusters in a MB). If the result is an integer, your partition is aligned properly to 1MB.
To see your SSDs physical and logical block size, look at the following pseudo files:
With SSDs, whenever you write data to the disk, it must first erase any data in the sectors it's writing to, slowing the drive. Modern operating systems and SSDs support TRIM to keep the SSD from slowing down over time.
To verify TRIM support on your drive, use the following (where /dev/sda is your particular SSD, "df" command shows mount points) :
# hdparm -I /dev/sda | grep TRIM
* Data Set Management TRIM supported (limit 4 blocks)
There are generally three different ways to implement TRIM under modern Linux distributions:
1) discard (/etc/fstab mount option) - this performs TRIM in real-time after each file delete. Many distros don't enable this by default in fstab for SSDs. There seems to be a general consensus the "discard" option is not a very good method, as it is resource intensive and introduces some unnecessary performance issues.
2) fstrim via cron job - this seems to be the preferred method currently, by adding the fstrim command to a weekly, or even monthly cron job. The only disadvantage of this method is that after a reboot, the fstrim command would perform a trim on the whole filesystem. It has something to do with the fact that the record of what has been trimmed is kept in kernel memory and volatile. Below is a sample script that can be put in the /etc/cron.weekly directory to do a trim on all drives that support it (set permissions to be executable):
Some older versions of fstrim may not support the -a parameter, in such cases you will have to specify the mount points to trim manually ("/" and "/boot", in this example):
2) sustemctl enable fstrim.timer - this is a relatively new systemd option, similar to fstrim via a cron.job. It performs a weekly fstrim on the system, and can be checked with systemctl status fstrim.timer
Partitions to keep away from the SSD
You can keep most of your partitions to take advantage of the SSD speed, including /, /boot, /boot/efi, etc. However, you should try to keep swap partitions, temporary files and log files that constantly write away from SSDs to increase their lifespan.
When creating partitions, simply put your swap partition on a HDD instead of the SSD. You can usually skip creating a swap partition when installing Linux if you have plenty of memory, however, you should also reduce "swappiness" to zero so the OS never attempts to swap to disk. it may be a better idea to simply reduce swappiness instead.
To create a bash script that adjusts swappiness at boot time:
1) navigate to (create, if necessary): /etc/rc.d/rc.local
2) add the following line to it
The above ensures that the OS will only try to use a swap file when all RAM is exhausted. The default value for swappiness is 60, and lower numbers mean the OS will use the swap file less.
A more sensible solution may be to simply create the swap partition on a HDD, and reduce swappiness somewhat, i.e. use a value of 30.
Directories to move away from the SSD
It is also a good idea to keep temporary files away from the SSD to reduce writes. To move temporary files to RAM, edit /etc/fstab and add the following line to it:
tmpfs /tmp tmpfs nodev,nosuid,noexec,mode=1777 0 0
Other good candidates for moving from the SSD to RAM (or another HDD) may be browser cache files (for desktop system), and /var/log, if you can live with all system logs being volatile and not surviving reboots. This is most useful for NAS or embedded applications. There are some subdirectories and files in /var/log that may have to be recreated each time you reboot, however. This can be accomplished with a couple of lines in the /etc/rc.d/rc.local file we already created/edited, here is an example:
### directories to recreate at bood time
Alternatively, you can move the /var/log directory (and possibly /var/cache, /var/spool) to another drive to reduce writes to the SSD. To accomplish this, the easiest way is using a symbolic link to the new location. Assuming that you have a HDD mounted at /mnt/hdd1 , do:
# stop the syslog.service
# create a directory on /mnt/hdd1
# move the directory structure and contents of /var/log over to the new directory
# create a symbolic link from /var/log to the new location
It may be a good idea to reboot at this point, so that other programs that may be writing to /var/log work properly. What we did here is, instead of reconfiguring every service that may possibly want to write to /var/log, we just redirect them to a new location using a symbolic link. That way any new programs that try to log to directories under /var/log will still work as expected. You can move /var/cache and /var/spool using the same method:
mv /var/cache /mnt/hdd1/var/
Notes: SELinux may have some issues with these symbolic links. There is an alternative of using "mount --bind /mnt/hdd1/var/log /var/log" to create new mount points instead of symbolic links, but that is usually reserved
Tune Fstab Filesystem Options
Most mount options, filesystem types, /tmp directory and other boot options in modern linux systems are configured in the /etc/fstab file.
Backup fstab before making any changes, so you can easily recover from errors:
#cp /ets/fstab /etc/fstab.bak
Modify /etc/fstab mount parameters for your SSD partition by adding the following options separated by commas:
noatime - do not update access time for each accessed file/directory to reduce disk writes (automatically implies nodiratime) . This works well for both HDDs and SDDs. If you need to keep access times functionality, using relatime is a good compromise (only causes atime write if the file has been modified since last being accessed).
discard - enables TRIM support with ext4 and kernels 2.6.33 or later
commit=30 - delays/buffers writes to disk for up to 60 seconds. It may be a problem if power interruption is likely. The default value is 5 seconds, you can use a number up to a couple of minutes depending on the likelihood of power interruption (you may lose up to N seconds of work, though most of the time this won't happen as software can still sync the data to disk overwriting the commit setting). This reduces disk writes and increases performance by combining writes into one single larger write, and cancelling updates to previous writes within the commit time frame.
data=writeback - no data journaling (but metadata is journaled). A crash/recovery cycle can cause incorrect data to be in files updated shortly before the crash. Best performance of all journaling modes, and less writes increases SSD longevity.
data=ordered - journals metadata, but orders metadata changes with the data blocks into "transactions". When a write is done, the associated data blocks are written first. This journaling method is a good compromise between performance and data safety.
data=journal - all data and metadata is journaled, slow.
Example /etc/fstab :
Currently mounted filesystems, along with their corresponding mount options can be viewed using simply: #mount
Change the I/O Scheduler
Linux has three main kernel I/O schedulers: CFQ, DEADLINE, and NOOP. The default is CFQ, SSDs can benefit from using either DEADLINE or NOOP instead, as outlined below. The task of an I/O scheduler is mainly to group, reorder, and merge I/O operations when possible. One of the main intents is to decrease disk seeks. Flash devices have negligible seek times, and therefore can benefit from switching to a different I/O scheduler.
CFQ (Completely Fair Queuing) - this is the default scheduler since kernel 2.6.18 and has been designed to deal with the rotational latencies of spinning platter drives. Ir places synchronous requests submitted by processes into a number of per-process queues and then allocates timeslices for each of the queues to access the disk. It prioritizes the number of requests and the length of the timeslice depending on the process priority.
DEADLINE - this scheduler does some sorting to guarantee read requests take priority over write, which is useful to guarantee read responsiveness under heavy writes. The DEADLINE scheduler imposes a deadline to every I/O operation. It uses multiple queues to store operations and sorts them according to their deadline. By committing to these deadlines, it can gruarantee no I/O starvation during normal CPU loads. It is a good fit if you are worried about I/O starvation, and if you'd like to keep sorted read/write queues. It is better than CFQ for SSD drives.
NOOP - the NOOP scheduler uses the least CPU cycles, as it is a simple FIFO (first-in first-out) queue and implements request merging, without any reordering. It is very battery-friendly. It is a great match for laptops, SSDs, USB flash drives, and any flash media where there is negligible seek penalty.
To view the currently used scheduler for "sda", for example, execute:
To change the scheduler for a specific drive ("sda" and "sdb" in the example below), add the following to /etc/rc.d/rc.local (to be applied at boot time):
# changes the I/O scheduler for sda to NOOP
# changes the I/O scheduler for sdb to DEADLINE
Memory Disk Write Buffers
The Linux kernel has a number of tunable memory write buffers that define how the system uses memory to delay disk writes. You can control how often the OS writes old "dirty" data to disk, how aggressively to use the swap file, etc. A number of pseudo files under
/proc/sys/vm/laptop_mode - determines how many seconds after a read should a writeout of changed files start (this is based on the assumption that a read will cause an otherwise spun down disk to spin up again). This delays writes to disk (initially intended to allow laptop disks to spin down while not in use, hence the name)
Increase the time it takes memory to write to disk by adding the following to the bottom of the /etc/sysctl.conf file (to be applied at boot time):
# dirty_ratio is the max percent of memory to use (default is 20)
Check total disk writes with S.M.A.R.T.
Newer SSDs have long lifespan, except maybe smaller TLC drives. Still, it is a good idea to know what your average daily writes, and life expectancy of the drive is. You may want to check the total bytes written to the SSD in Linux by using the following shell command that displays S.M.A.R.T. data (assuming your SSD is at /dev/sda):
smartctl -A /dev/sda
Look for line 241, that says something like:
Samsung: "Total LBAs Written" (value is in LBAs) - multiply by 512 to get total bytes written, RAW_VALUE * 32 / 1024 / 1024 / 1024 = total Gigabytes written.
You can use this number to estimate your daily writes, and total life expectancy of the drive.
For desktop systems, it may be beneficial to move the browser cache to /tmp (if mounted in ram, the only notable drawbacks being that it is volatile and shared between user accounts). You may also want to take a look at iotop and look at what processes write to disk the most.
Here is another good candidate to be added to the /etc/rc.d/rc.local file as well (to be executed at each boot):
# dont do kernel crashdumps - reduces disk writes and wakeups
Windows users may want to check out our Windows SSD Speed Tweaks article as well.