Table of Contents

When searching through large files or directories using grep
, performance can sometimes be slow. One way to speed up grep
searches is by setting the LC_ALL
environment variable. This article explains how LC_ALL
affects grep
performance and how you can use it to optimize search speed.
Understanding Locale and Internationalization Variables
In a shell execution environment, system behavior is influenced by environment variables. A special subset of these variables, known as internationalization variables, determines how support for internationalized applications operates. Since grep
is an internationalized application, its performance is affected by these settings.
You can check your server’s current locale settings by running:
Example output:
Why Does LC_ALL Affect grep Speed?
The LC_ALL
variable controls locale settings, including character encoding and collation order. By default, grep
processes text based on locale-specific rules, which can slow down searches. Setting LC_ALL=C
forces grep
to use a more straightforward, faster byte-based comparison instead of complex locale-aware processing.
LC_ALL Variable Explained
The LC_ALL
variable overrides all other LC_*
settings, allowing you to set the locale globally for a command or session. For instance, appending LC_ALL=C
before a command changes its locale setting to the C
locale, which is the default Unix/Linux ASCII environment.
How to Use LC_ALL to Speed Up grep
Temporary Use in a Single Command
If you want to apply LC_ALL=C
for single grep
command, prefix the command as follows:
This tells grep
to use the C
locale for that specific command, improving performance.
Setting LC_ALL Permanently
To make this optimization permanent, you can export LC_ALL
in your shell profile file.
For Bash Users:
Add the following line to your ~/.bashrc
or ~/.bash_profile
file:
Then, apply the changes by running:
For Zsh Users:
If you use Zsh, add the same line to /.zshrc
and apply the changes:
UTF-8 vs ASCII: Why Does it Matter?
By default, most modern systems use UTF-8 encoding. UTF-8 can represent over 110,000 unique characters, supporting multiple writing systems worldwide. However, grep
is often used to search through files encoded in ASCII, which consists of only 128 unique characters.
Because UTF-8 requires more complex processing, searches using the default locale settings may be slower. By switching to the C
locale (which defaults to ASCII), grep
can operate more efficiently, reducing processing overhead and improving performance.
Performance Comparison
To compare performance with and without LC_ALL=C
, use the time
command:
You should notice a significant decrease in execution time when using LC_ALL=C
.
Test Results
Several tests were conducted using different file sizes to measure the impact of LC_ALL=C
:
Test 1: Small File (~10MB)
Results:
- Standard
grep
: ~0.3s LC_ALL=C grep
: ~0.2s
Test 2: Medium File (~500MB)
Results:
- Standard
grep
: ~5.2s LC_ALL=C grep
: ~3.1s
Test 3: Large File (~5GB)
Results:
- Standard
grep
: ~50.4s LC_ALL=C grep
: ~28.7s
The tests confirmed that using LC_ALL=C
provides a noticeable performance improvement, especially for large files.
Conclusion
By setting LC_ALL=C
, you can enhance grep
search performance, especially when dealing with large files. This simple optimization reduces processing overhead and speeds up search operations, making it an effective tweak for power users and system administrators.
For more Linux tips, check out our Linux tutorials.
I imagine the reason it did not work for many people is that their default language was already C (mine is). If you are unsure of what your default locale is, you should set LC_LANG to en_US.UTF-8 or whatever before running the tests
Does not have any effect on Ubuntu 14.04 and 16.04. It did work for me for sure last time I’ve tried it back in 2008.
Zaar
Thank you for sharing this great finding! This trick made my process go from 1.20 hours to a matter of seconds!
Thanks again for making my life better =)
No errors, just no improvement in time to run grep commands.
Tried this on a Centos 5 system with no luck. Is it OS or distro specific?
Hello MadMan,
This tutorial was made either on a Centos 5.6 or 6.0 server. This should work. Are you getting any errors?
Best Regards,
TJ Edens
Hey great article Jacob,
This does affect more than meets the eye. For instance download a file with UTF-8 characters in it, like many web pages, and then use an strace to see how greps regex is affected:
As you wrote, this may not give you what you were expecting:
LC_ALL=C sort moop.txt
But, this might:
LC_ALL=C sort -f moop.txt
You could also eliminate the Linux buffer cache from skewing your testing by droppig the caches before each test.
echo 3 > /proc/sys/vm/drop_caches
Hello Noah, and thanks for the comment!
You are correct! That is another great way to make sure the pagecache isn’t skewing results. However be careful because your system could seem a bit sluggish as it rebuilds back up the pagecache after totally clearing it out.
Thanks again!
– Jacob