Regex is the ultimate swiss army knife. The kind you can use for years and still not exercise it all. That's how I felt years ago when I first stumbled on using it to solve capitalization problems. I needed a refresher on some of those techniques today and found myself sifting through a lot of incomplete answers online. Using one of them lead to disaster because it overlooked some important details. So I decided to do a quick write up on how to solve a couple real world uppercase and lowercase situations for the next time I need a refresher.
A Good Regex Editor
First, not all regex editors are created equal. You know you have a decent one when it supports upper and lower case conversion. This is one reason I chose Sublime years ago over other popular choices. Also, it was something I missed when I switched to Android Studio from Eclipse because this feature was missing until IntelliJ 15 (fixed in build 142.2822 but barely mentioned). Now that it's baked into Android Studio we no longer have to hop over to Sublime to do heavy lifting in Regex. Test your favorite editor or online regex tool for the techniques listed below and if it doesn't support them, maybe consider a switch.
Real World Example
String Constants into String Resources
We're working tech debt today and I was busy cleaning up an ugly file of String constants. I won't belabor why it existed for this client but only point out that my desire was to convert it into resources instead.
The goal was to convert this:
public static final String EXTERNAL_SHOP_URL = "http://buy.our.stuff.pretty/please";
public static final String EXTERNAL_PRIVACY_POLICY = "http://we.no.steal.you.info/promise";
public static final String EXTERNAL_TERMS_AND_CONDITIONS = "http://you.give.us.arm.and.leg/promptly";
public static final String EXTERNAL_HELP = "https://why.you.has.minor/problem?key=ABC123";
to this (dozens of times):
http://buy.our.stuff.pretty/please http://we.no.steal.you.info/promise http://you.give.us.arm.and.leg/promptly https://why.you.has.minor/problem?key=ABC123
The key to accomplishing this is the special \L and \E single character classes when replacing. Like the related \U \u and \l characters, these control the case of characters that come after. \L and \U will make everything that follows lowercase or uppercase, respectively. Similarly, \l and \u will make just the next character upper or lower case. Using that knowledge, press CMD+R to open find and replace and enter a regex similar to the following to pull out the capture groups and convert them to the desired case:
Find:
public static final String ([A-Z_0-9]+)\s*=\s*"([^"]+)";
Replace:
$2
Caution: Don't forget \E
When using \L and \U all subsequent characters are converted until the next \U or \L or \E is encountered. This is the detail that most online answers omit. It nearly caused a disaster while I converted URLs because public static final String EXTERNAL_HELP = "https://why.you.has.minor/problem?key=ABC123";
has a case-sensitive key on the end. Changing the case of the entire line also changes the URL keys and that can be very easy to overlook, introducing subtle bugs. Since I wanted the lowercase change to stop after the "name" attribute, I had to change the "replace" regex from:
<string name="\L$1">$2</string>to:
<string name="\L$1\E">$2</string>
So don't forget \E when appropriate!
Real World Example
camelCaseVariables to/from CONSTANTS
As another quick example, when we want to change constants like VARIABLE_THAT_SHOULD_BE_CAMEL_CASE, it is easiest to do so in two steps. A fancy-pants regex like ([A-Z]+)(_[A-Z]+)*
would group all the words together and be useful in a program, where you can iterate over them. Unfortunately, in the simplified world of find/replace, the final capture group ($2) is repeatedly replaced with the last match. So there's no easy way to do this in one step. Instead, use two:
- change the case
- and then convert the underscores.
find:[A-Z_]+\b
replace:\L$0
find:_([a-z])
replace:\u$1
Note: using \b
in step one helps avoid converting things like the first letter in "String" by only matching all caps that end with word boundaries. Also, because step 1 can be completed with the IDE shortcut CMD+SHIFT+U it is sometimes handy to use that, instead, and then skip to step 2 to replace the underscores.
To do this in reverse and create constants from variables
- convert the camel humps to underscores
- and then convert to uppercase
find:([a-z])([A-Z])
replace:$1_\l$2
find:[a-z]+(_[a-z]+)+
replace:\U$0
Here we make step 2 a little more specific, requiring that the text we find must contain at least one word prefixed with an underscore, so that it only matches the underscored variables that we just created.
Summary (TL;DR)
Regex can be used to change the case of matched groups, which is powerful when done in bulk. Many tools now support this by default, including IntelliJ recently and that should be a factor when selecting an online regex tool. The key is to use \U \L \u \l and \E but most online posts forget to mention \E.
No comments:
Post a Comment