Code python

Remove Special Characters from Column in PySpark DataFrame

Kontext Kontext visibility 14,760 comment 0 access_time 2 years ago language English

descriptionCode description

Spark SQL function regex_replace can be used to remove special characters from a string column in Spark DataFrame. Depends on the definition of special characters, the regular expressions can vary. For instance, [^0-9a-zA-Z_\-]+ can be used to match characters that are not alphanumeric or are not hyphen(-) or underscore(_); regular expression '[@\+\#\$\%\^\!]+' can match these defined special characters.

This code snippet replace special characters with empty string.


|id |str                       |
|1  |ABCDEDF!@#$%%^123456qwerty|
|2  |ABCDE!!!                  |

| id|       replaced_str|
|  1|ABCDEDF123456qwerty|
|  2|              ABCDE|
copyright This page is subject to Site terms.
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts