File: //proc/thread-self/root/opt/gsutil/third_party/charset_normalizer/docs/user/getstarted.rst
Installation
============
This installs a package that can be used from Python (``import charset_normalizer``).
To install for all users on the system, administrator rights (root) may be required.
Using PIP
---------
Charset Normalizer can be installed from pip::
pip install charset-normalizer
You may retrieve the latest unicodedata backport as follow::
pip install charset-normalizer[unicode_backport]
From git via master
-----------------------
You can install from dev-master branch using git::
git clone https://github.com/Ousret/charset_normalizer.git
cd charset_normalizer/
python setup.py install
Basic Usage
===========
The new way
-----------
You may want to get right to it. ::
from charset_normalizer import from_bytes, from_path
# This is going to print out your sequence once properly decoded
print(
str(
from_bytes(
my_byte_str
).best()
)
)
# You could also want the same from a file
print(
str(
from_path(
'./data/sample.1.ar.srt'
).best()
)
)
Backward compatibility
----------------------
If you were used to python chardet, we are providing the very same ``detect()`` method as chardet.
This function is mostly backward-compatible with Chardet. The migration should be painless.
::
from charset_normalizer import detect
# This will behave exactly the same as python chardet
result = detect(my_byte_str)
if result['encoding'] is not None:
print('got', result['encoding'], 'as detected encoding')
You may upgrade your code with ease.
CTRL + R ``from chardet import detect`` to ``from charset_normalizer import detect``.