Remove Special Characters from Column in PySpark DataFrame

Spark SQL function regex_replace can be used to remove special characters from a string column in Spark DataFrame. Depends on the definition of special characters, the regular expressions can vary. For instance, [^0-9a-zA-Z_\-]+ can be used to match characters that are not alphanumeric or are not hyphen(-) or underscore(_); regular expression '[@\+\#\$\%\^\!]+' can match these defined special characters.

This code snippet replace special characters with empty string.


|id |str                       |
|1  |ABCDEDF!@#$%%^123456qwerty|
|2  |ABCDE!!!                  |

| id|       replaced_str|
|  1|ABCDEDF123456qwerty|
|  2|              ABCDE|
