summaryrefslogtreecommitdiff
path: root/ext/pybind11/docs/advanced/cast
diff options
context:
space:
mode:
authorAndreas Sandberg <andreas.sandberg@arm.com>2017-05-09 19:22:53 +0100
committerAndreas Sandberg <andreas.sandberg@arm.com>2017-05-22 17:15:09 +0000
commit6914a229a038206341ae1fea46393965a555ca9a (patch)
tree4a11cfaed46dabc827c5ee17cd976f42b5f53d49 /ext/pybind11/docs/advanced/cast
parentca1d18d599dcc620bf526fb22042af95b1b60b68 (diff)
downloadgem5-6914a229a038206341ae1fea46393965a555ca9a.tar.xz
ext: Upgrade PyBind11 to version 2.1.1
Change-Id: I16870dec402d661295f9d013dc23e362b2b2c169 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-on: https://gem5-review.googlesource.com/3225 Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Diffstat (limited to 'ext/pybind11/docs/advanced/cast')
-rw-r--r--ext/pybind11/docs/advanced/cast/chrono.rst4
-rw-r--r--ext/pybind11/docs/advanced/cast/eigen.rst316
-rw-r--r--ext/pybind11/docs/advanced/cast/index.rst1
-rw-r--r--ext/pybind11/docs/advanced/cast/overview.rst12
-rw-r--r--ext/pybind11/docs/advanced/cast/stl.rst12
-rw-r--r--ext/pybind11/docs/advanced/cast/strings.rst243
6 files changed, 553 insertions, 35 deletions
diff --git a/ext/pybind11/docs/advanced/cast/chrono.rst b/ext/pybind11/docs/advanced/cast/chrono.rst
index 6d4a5ee55..8c6b3d7e5 100644
--- a/ext/pybind11/docs/advanced/cast/chrono.rst
+++ b/ext/pybind11/docs/advanced/cast/chrono.rst
@@ -4,8 +4,8 @@ Chrono
When including the additional header file :file:`pybind11/chrono.h` conversions
from C++11 chrono datatypes to python datetime objects are automatically enabled.
This header also enables conversions of python floats (often from sources such
-as `time.monotonic()`, `time.perf_counter()` and `time.process_time()`) into
-durations.
+as ``time.monotonic()``, ``time.perf_counter()`` and ``time.process_time()``)
+into durations.
An overview of clocks in C++11
------------------------------
diff --git a/ext/pybind11/docs/advanced/cast/eigen.rst b/ext/pybind11/docs/advanced/cast/eigen.rst
index b83ca9af9..5b0b08ca6 100644
--- a/ext/pybind11/docs/advanced/cast/eigen.rst
+++ b/ext/pybind11/docs/advanced/cast/eigen.rst
@@ -1,48 +1,308 @@
Eigen
-=====
+#####
`Eigen <http://eigen.tuxfamily.org>`_ is C++ header-based library for dense and
sparse linear algebra. Due to its popularity and widespread adoption, pybind11
-provides transparent conversion support between Eigen and Scientific Python linear
-algebra data types.
+provides transparent conversion and limited mapping support between Eigen and
+Scientific Python linear algebra data types.
-Specifically, when including the optional header file :file:`pybind11/eigen.h`,
-pybind11 will automatically and transparently convert
+To enable the built-in Eigen support you must include the optional header file
+:file:`pybind11/eigen.h`.
-1. Static and dynamic Eigen dense vectors and matrices to instances of
- ``numpy.ndarray`` (and vice versa).
+Pass-by-value
+=============
-2. Returned matrix expressions such as blocks (including columns or rows) and
- diagonals will be converted to ``numpy.ndarray`` of the expression
- values.
+When binding a function with ordinary Eigen dense object arguments (for
+example, ``Eigen::MatrixXd``), pybind11 will accept any input value that is
+already (or convertible to) a ``numpy.ndarray`` with dimensions compatible with
+the Eigen type, copy its values into a temporary Eigen variable of the
+appropriate type, then call the function with this temporary variable.
-3. Returned matrix-like objects such as Eigen::DiagonalMatrix or
- Eigen::SelfAdjointView will be converted to ``numpy.ndarray`` containing the
- expressed value.
+Sparse matrices are similarly copied to or from
+``scipy.sparse.csr_matrix``/``scipy.sparse.csc_matrix`` objects.
-4. Eigen sparse vectors and matrices to instances of
- ``scipy.sparse.csr_matrix``/``scipy.sparse.csc_matrix`` (and vice versa).
+Pass-by-reference
+=================
-This makes it possible to bind most kinds of functions that rely on these types.
-One major caveat are functions that take Eigen matrices *by reference* and modify
-them somehow, in which case the information won't be propagated to the caller.
+One major limitation of the above is that every data conversion implicitly
+involves a copy, which can be both expensive (for large matrices) and disallows
+binding functions that change their (Matrix) arguments. Pybind11 allows you to
+work around this by using Eigen's ``Eigen::Ref<MatrixType>`` class much as you
+would when writing a function taking a generic type in Eigen itself (subject to
+some limitations discussed below).
+
+When calling a bound function accepting a ``Eigen::Ref<const MatrixType>``
+type, pybind11 will attempt to avoid copying by using an ``Eigen::Map`` object
+that maps into the source ``numpy.ndarray`` data: this requires both that the
+data types are the same (e.g. ``dtype='float64'`` and ``MatrixType::Scalar`` is
+``double``); and that the storage is layout compatible. The latter limitation
+is discussed in detail in the section below, and requires careful
+consideration: by default, numpy matrices and eigen matrices are *not* storage
+compatible.
+
+If the numpy matrix cannot be used as is (either because its types differ, e.g.
+passing an array of integers to an Eigen paramater requiring doubles, or
+because the storage is incompatible), pybind11 makes a temporary copy and
+passes the copy instead.
+
+When a bound function parameter is instead ``Eigen::Ref<MatrixType>`` (note the
+lack of ``const``), pybind11 will only allow the function to be called if it
+can be mapped *and* if the numpy array is writeable (that is
+``a.flags.writeable`` is true). Any access (including modification) made to
+the passed variable will be transparently carried out directly on the
+``numpy.ndarray``.
+
+This means you can can write code such as the following and have it work as
+expected:
.. code-block:: cpp
- /* The Python bindings of these functions won't replicate
- the intended effect of modifying the function arguments */
- void scale_by_2(Eigen::Vector3f &v) {
+ void scale_by_2(Eigen::Ref<Eigen::VectorXd> m) {
v *= 2;
}
- void scale_by_2(Eigen::Ref<Eigen::MatrixXd> &v) {
- v *= 2;
+
+Note, however, that you will likely run into limitations due to numpy and
+Eigen's difference default storage order for data; see the below section on
+:ref:`storage_orders` for details on how to bind code that won't run into such
+limitations.
+
+.. note::
+
+ Passing by reference is not supported for sparse types.
+
+Returning values to Python
+==========================
+
+When returning an ordinary dense Eigen matrix type to numpy (e.g.
+``Eigen::MatrixXd`` or ``Eigen::RowVectorXf``) pybind11 keeps the matrix and
+returns a numpy array that directly references the Eigen matrix: no copy of the
+data is performed. The numpy array will have ``array.flags.owndata`` set to
+``False`` to indicate that it does not own the data, and the lifetime of the
+stored Eigen matrix will be tied to the returned ``array``.
+
+If you bind a function with a non-reference, ``const`` return type (e.g.
+``const Eigen::MatrixXd``), the same thing happens except that pybind11 also
+sets the numpy array's ``writeable`` flag to false.
+
+If you return an lvalue reference or pointer, the usual pybind11 rules apply,
+as dictated by the binding function's return value policy (see the
+documentation on :ref:`return_value_policies` for full details). That means,
+without an explicit return value policy, lvalue references will be copied and
+pointers will be managed by pybind11. In order to avoid copying, you should
+explictly specify an appropriate return value policy, as in the following
+example:
+
+.. code-block:: cpp
+
+ class MyClass {
+ Eigen::MatrixXd big_mat = Eigen::MatrixXd::Zero(10000, 10000);
+ public:
+ Eigen::MatrixXd &getMatrix() { return big_mat; }
+ const Eigen::MatrixXd &viewMatrix() { return big_mat; }
+ };
+
+ // Later, in binding code:
+ py::class_<MyClass>(m, "MyClass")
+ .def(py::init<>())
+ .def("copy_matrix", &MyClass::getMatrix) // Makes a copy!
+ .def("get_matrix", &MyClass::getMatrix, py::return_value_policy::reference_internal)
+ .def("view_matrix", &MyClass::viewMatrix, py::return_value_policy::reference_internal)
+ ;
+
+.. code-block:: python
+
+ a = MyClass()
+ m = a.get_matrix() # flags.writeable = True, flags.owndata = False
+ v = a.view_matrix() # flags.writeable = False, flags.owndata = False
+ c = a.copy_matrix() # flags.writeable = True, flags.owndata = True
+ # m[5,6] and v[5,6] refer to the same element, c[5,6] does not.
+
+Note in this example that ``py::return_value_policy::reference_internal`` is
+used to tie the life of the MyClass object to the life of the returned arrays.
+
+You may also return an ``Eigen::Ref``, ``Eigen::Map`` or other map-like Eigen
+object (for example, the return value of ``matrix.block()`` and related
+methods) that map into a dense Eigen type. When doing so, the default
+behaviour of pybind11 is to simply reference the returned data: you must take
+care to ensure that this data remains valid! You may ask pybind11 to
+explicitly *copy* such a return value by using the
+``py::return_value_policy::copy`` policy when binding the function. You may
+also use ``py::return_value_policy::reference_internal`` or a
+``py::keep_alive`` to ensure the data stays valid as long as the returned numpy
+array does.
+
+When returning such a reference of map, pybind11 additionally respects the
+readonly-status of the returned value, marking the numpy array as non-writeable
+if the reference or map was itself read-only.
+
+.. note::
+
+ Sparse types are always copied when returned.
+
+.. _storage_orders:
+
+Storage orders
+==============
+
+Passing arguments via ``Eigen::Ref`` has some limitations that you must be
+aware of in order to effectively pass matrices by reference. First and
+foremost is that the default ``Eigen::Ref<MatrixType>`` class requires
+contiguous storage along columns (for column-major types, the default in Eigen)
+or rows if ``MatrixType`` is specifically an ``Eigen::RowMajor`` storage type.
+The former, Eigen's default, is incompatible with ``numpy``'s default row-major
+storage, and so you will not be able to pass numpy arrays to Eigen by reference
+without making one of two changes.
+
+(Note that this does not apply to vectors (or column or row matrices): for such
+types the "row-major" and "column-major" distinction is meaningless).
+
+The first approach is to change the use of ``Eigen::Ref<MatrixType>`` to the
+more general ``Eigen::Ref<MatrixType, 0, Eigen::Stride<Eigen::Dynamic,
+Eigen::Dynamic>>`` (or similar type with a fully dynamic stride type in the
+third template argument). Since this is a rather cumbersome type, pybind11
+provides a ``py::EigenDRef<MatrixType>`` type alias for your convenience (along
+with EigenDMap for the equivalent Map, and EigenDStride for just the stride
+type).
+
+This type allows Eigen to map into any arbitrary storage order. This is not
+the default in Eigen for performance reasons: contiguous storage allows
+vectorization that cannot be done when storage is not known to be contiguous at
+compile time. The default ``Eigen::Ref`` stride type allows non-contiguous
+storage along the outer dimension (that is, the rows of a column-major matrix
+or columns of a row-major matrix), but not along the inner dimension.
+
+This type, however, has the added benefit of also being able to map numpy array
+slices. For example, the following (contrived) example uses Eigen with a numpy
+slice to multiply by 2 all coefficients that are both on even rows (0, 2, 4,
+...) and in columns 2, 5, or 8:
+
+.. code-block:: cpp
+
+ m.def("scale", [](py::EigenDRef<Eigen::MatrixXd> m, double c) { m *= c; });
+
+.. code-block:: python
+
+ # a = np.array(...)
+ scale_by_2(myarray[0::2, 2:9:3])
+
+The second approach to avoid copying is more intrusive: rearranging the
+underlying data types to not run into the non-contiguous storage problem in the
+first place. In particular, that means using matrices with ``Eigen::RowMajor``
+storage, where appropriate, such as:
+
+.. code-block:: cpp
+
+ using RowMatrixXd = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
+ // Use RowMatrixXd instead of MatrixXd
+
+Now bound functions accepting ``Eigen::Ref<RowMatrixXd>`` arguments will be
+callable with numpy's (default) arrays without involving a copying.
+
+You can, alternatively, change the storage order that numpy arrays use by
+adding the ``order='F'`` option when creating an array:
+
+.. code-block:: python
+
+ myarray = np.array(source, order='F')
+
+Such an object will be passable to a bound function accepting an
+``Eigen::Ref<MatrixXd>`` (or similar column-major Eigen type).
+
+One major caveat with this approach, however, is that it is not entirely as
+easy as simply flipping all Eigen or numpy usage from one to the other: some
+operations may alter the storage order of a numpy array. For example, ``a2 =
+array.transpose()`` results in ``a2`` being a view of ``array`` that references
+the same data, but in the opposite storage order!
+
+While this approach allows fully optimized vectorized calculations in Eigen, it
+cannot be used with array slices, unlike the first approach.
+
+When *returning* a matrix to Python (either a regular matrix, a reference via
+``Eigen::Ref<>``, or a map/block into a matrix), no special storage
+consideration is required: the created numpy array will have the required
+stride that allows numpy to properly interpret the array, whatever its storage
+order.
+
+Failing rather than copying
+===========================
+
+The default behaviour when binding ``Eigen::Ref<const MatrixType>`` eigen
+references is to copy matrix values when passed a numpy array that does not
+conform to the element type of ``MatrixType`` or does not have a compatible
+stride layout. If you want to explicitly avoid copying in such a case, you
+should bind arguments using the ``py::arg().noconvert()`` annotation (as
+described in the :ref:`nonconverting_arguments` documentation).
+
+The following example shows an example of arguments that don't allow data
+copying to take place:
+
+.. code-block:: cpp
+
+ // The method and function to be bound:
+ class MyClass {
+ // ...
+ double some_method(const Eigen::Ref<const MatrixXd> &matrix) { /* ... */ }
+ };
+ float some_function(const Eigen::Ref<const MatrixXf> &big,
+ const Eigen::Ref<const MatrixXf> &small) {
+ // ...
}
-To see why this is, refer to the section on :ref:`opaque` (although that
-section specifically covers STL data types, the underlying issue is the same).
-The :ref:`numpy` sections discuss an efficient alternative for exposing the
-underlying native Eigen types as opaque objects in a way that still integrates
-with NumPy and SciPy.
+ // The associated binding code:
+ using namespace pybind11::literals; // for "arg"_a
+ py::class_<MyClass>(m, "MyClass")
+ // ... other class definitions
+ .def("some_method", &MyClass::some_method, py::arg().nocopy());
+
+ m.def("some_function", &some_function,
+ "big"_a.nocopy(), // <- Don't allow copying for this arg
+ "small"_a // <- This one can be copied if needed
+ );
+
+With the above binding code, attempting to call the the ``some_method(m)``
+method on a ``MyClass`` object, or attempting to call ``some_function(m, m2)``
+will raise a ``RuntimeError`` rather than making a temporary copy of the array.
+It will, however, allow the ``m2`` argument to be copied into a temporary if
+necessary.
+
+Note that explicitly specifying ``.noconvert()`` is not required for *mutable*
+Eigen references (e.g. ``Eigen::Ref<MatrixXd>`` without ``const`` on the
+``MatrixXd``): mutable references will never be called with a temporary copy.
+
+Vectors versus column/row matrices
+==================================
+
+Eigen and numpy have fundamentally different notions of a vector. In Eigen, a
+vector is simply a matrix with the number of columns or rows set to 1 at
+compile time (for a column vector or row vector, respectively). Numpy, in
+contast, has comparable 2-dimensional 1xN and Nx1 arrays, but *also* has
+1-dimensional arrays of size N.
+
+When passing a 2-dimensional 1xN or Nx1 array to Eigen, the Eigen type must
+have matching dimensions: That is, you cannot pass a 2-dimensional Nx1 numpy
+array to an Eigen value expecting a row vector, or a 1xN numpy array as a
+column vector argument.
+
+On the other hand, pybind11 allows you to pass 1-dimensional arrays of length N
+as Eigen parameters. If the Eigen type can hold a column vector of length N it
+will be passed as such a column vector. If not, but the Eigen type constraints
+will accept a row vector, it will be passed as a row vector. (The column
+vector takes precendence when both are supported, for example, when passing a
+1D numpy array to a MatrixXd argument). Note that the type need not be
+expicitly a vector: it is permitted to pass a 1D numpy array of size 5 to an
+Eigen ``Matrix<double, Dynamic, 5>``: you would end up with a 1x5 Eigen matrix.
+Passing the same to an ``Eigen::MatrixXd`` would result in a 5x1 Eigen matrix.
+
+When returning an eigen vector to numpy, the conversion is ambiguous: a row
+vector of length 4 could be returned as either a 1D array of length 4, or as a
+2D array of size 1x4. When encoutering such a situation, pybind11 compromises
+by considering the returned Eigen type: if it is a compile-time vector--that
+is, the type has either the number of rows or columns set to 1 at compile
+time--pybind11 converts to a 1D numpy array when returning the value. For
+instances that are a vector only at run-time (e.g. ``MatrixXd``,
+``Matrix<float, Dynamic, 4>``), pybind11 returns the vector as a 2D array to
+numpy. If this isn't want you want, you can use ``array.reshape(...)`` to get
+a view of the same data in the desired dimensions.
.. seealso::
diff --git a/ext/pybind11/docs/advanced/cast/index.rst b/ext/pybind11/docs/advanced/cast/index.rst
index 36586af5c..54c10570b 100644
--- a/ext/pybind11/docs/advanced/cast/index.rst
+++ b/ext/pybind11/docs/advanced/cast/index.rst
@@ -33,6 +33,7 @@ the last case of the above list.
:maxdepth: 1
overview
+ strings
stl
functional
chrono
diff --git a/ext/pybind11/docs/advanced/cast/overview.rst b/ext/pybind11/docs/advanced/cast/overview.rst
index ab37b90be..54c11a90a 100644
--- a/ext/pybind11/docs/advanced/cast/overview.rst
+++ b/ext/pybind11/docs/advanced/cast/overview.rst
@@ -94,14 +94,26 @@ as arguments and return values, refer to the section on binding :ref:`classes`.
+------------------------------------+---------------------------+-------------------------------+
| ``char`` | Character literal | :file:`pybind11/pybind11.h` |
+------------------------------------+---------------------------+-------------------------------+
+| ``char16_t`` | UTF-16 character literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-------------------------------+
+| ``char32_t`` | UTF-32 character literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-------------------------------+
| ``wchar_t`` | Wide character literal | :file:`pybind11/pybind11.h` |
+------------------------------------+---------------------------+-------------------------------+
| ``const char *`` | UTF-8 string literal | :file:`pybind11/pybind11.h` |
+------------------------------------+---------------------------+-------------------------------+
+| ``const char16_t *`` | UTF-16 string literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-------------------------------+
+| ``const char32_t *`` | UTF-32 string literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-------------------------------+
| ``const wchar_t *`` | Wide string literal | :file:`pybind11/pybind11.h` |
+------------------------------------+---------------------------+-------------------------------+
| ``std::string`` | STL dynamic UTF-8 string | :file:`pybind11/pybind11.h` |
+------------------------------------+---------------------------+-------------------------------+
+| ``std::u16string`` | STL dynamic UTF-16 string | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-------------------------------+
+| ``std::u32string`` | STL dynamic UTF-32 string | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-------------------------------+
| ``std::wstring`` | STL dynamic wide string | :file:`pybind11/pybind11.h` |
+------------------------------------+---------------------------+-------------------------------+
| ``std::pair<T1, T2>`` | Pair of two custom types | :file:`pybind11/pybind11.h` |
diff --git a/ext/pybind11/docs/advanced/cast/stl.rst b/ext/pybind11/docs/advanced/cast/stl.rst
index bbd23732b..c76da5ca1 100644
--- a/ext/pybind11/docs/advanced/cast/stl.rst
+++ b/ext/pybind11/docs/advanced/cast/stl.rst
@@ -5,10 +5,12 @@ Automatic conversion
====================
When including the additional header file :file:`pybind11/stl.h`, conversions
-between ``std::vector<>``, ``std::list<>``, ``std::set<>``, and ``std::map<>``
-and the Python ``list``, ``set`` and ``dict`` data structures are automatically
-enabled. The types ``std::pair<>`` and ``std::tuple<>`` are already supported
-out of the box with just the core :file:`pybind11/pybind11.h` header.
+between ``std::vector<>``/``std::list<>``/``std::array<>``,
+``std::set<>``/``std::unordered_set<>``, and
+``std::map<>``/``std::unordered_map<>`` and the Python ``list``, ``set`` and
+``dict`` data structures are automatically enabled. The types ``std::pair<>``
+and ``std::tuple<>`` are already supported out of the box with just the core
+:file:`pybind11/pybind11.h` header.
The major downside of these implicit conversions is that containers must be
converted (i.e. copied) on every Python->C++ and C++->Python transition, which
@@ -72,7 +74,7 @@ functions:
/* ... binding code ... */
py::class_<MyClass>(m, "MyClass")
- .def(py::init<>)
+ .def(py::init<>())
.def_readwrite("contents", &MyClass::contents);
In this case, properties can be read and written in their entirety. However, an
diff --git a/ext/pybind11/docs/advanced/cast/strings.rst b/ext/pybind11/docs/advanced/cast/strings.rst
new file mode 100644
index 000000000..c70fb0bec
--- /dev/null
+++ b/ext/pybind11/docs/advanced/cast/strings.rst
@@ -0,0 +1,243 @@
+Strings, bytes and Unicode conversions
+######################################
+
+.. note::
+
+ This section discusses string handling in terms of Python 3 strings. For Python 2.7, replace all occurrences of ``str`` with ``unicode`` and ``bytes`` with ``str``. Python 2.7 users may find it best to use ``from __future__ import unicode_literals`` to avoid unintentionally using ``str`` instead of ``unicode``.
+
+Passing Python strings to C++
+=============================
+
+When a Python ``str`` is passed from Python to a C++ function that accepts ``std::string`` or ``char *`` as arguments, pybind11 will encode the Python string to UTF-8. All Python ``str`` can be encoded in UTF-8, so this operation does not fail.
+
+The C++ language is encoding agnostic. It is the responsibility of the programmer to track encodings. It's often easiest to simply `use UTF-8 everywhere <http://utf8everywhere.org/>`_.
+
+.. code-block:: c++
+
+ m.def("utf8_test",
+ [](const std::string &s) {
+ cout << "utf-8 is icing on the cake.\n";
+ cout << s;
+ }
+ );
+ m.def("utf8_charptr",
+ [](const char *s) {
+ cout << "My favorite food is\n";
+ cout << s;
+ }
+ );
+
+.. code-block:: python
+
+ >>> utf8_test('🎂')
+ utf-8 is icing on the cake.
+ 🎂
+
+ >>> utf8_charptr('🍕')
+ My favorite food is
+ 🍕
+
+.. note::
+
+ Some terminal emulators do not support UTF-8 or emoji fonts and may not display the example above correctly.
+
+The results are the same whether the C++ function accepts arguments by value or reference, and whether or not ``const`` is used.
+
+Passing bytes to C++
+--------------------
+
+A Python ``bytes`` object will be passed to C++ functions that accept ``std::string`` or ``char*`` *without* conversion.
+
+
+Returning C++ strings to Python
+===============================
+
+When a C++ function returns a ``std::string`` or ``char*`` to a Python caller, **pybind11 will assume that the string is valid UTF-8** and will decode it to a native Python ``str``, using the same API as Python uses to perform ``bytes.decode('utf-8')``. If this implicit conversion fails, pybind11 will raise a ``UnicodeDecodeError``.
+
+.. code-block:: c++
+
+ m.def("std_string_return",
+ []() {
+ return std::string("This string needs to be UTF-8 encoded");
+ }
+ );
+
+.. code-block:: python
+
+ >>> isinstance(example.std_string_return(), str)
+ True
+
+
+Because UTF-8 is inclusive of pure ASCII, there is never any issue with returning a pure ASCII string to Python. If there is any possibility that the string is not pure ASCII, it is necessary to ensure the encoding is valid UTF-8.
+
+.. warning::
+
+ Implicit conversion assumes that a returned ``char *`` is null-terminated. If there is no null terminator a buffer overrun will occur.
+
+Explicit conversions
+--------------------
+
+If some C++ code constructs a ``std::string`` that is not a UTF-8 string, one can perform a explicit conversion and return a ``py::str`` object. Explicit conversion has the same overhead as implicit conversion.
+
+.. code-block:: c++
+
+ // This uses the Python C API to convert Latin-1 to Unicode
+ m.def("str_output",
+ []() {
+ std::string s = "Send your r\xe9sum\xe9 to Alice in HR"; // Latin-1
+ py::str py_s = PyUnicode_DecodeLatin1(s.data(), s.length());
+ return py_s;
+ }
+ );
+
+.. code-block:: python
+
+ >>> str_output()
+ 'Send your résumé to Alice in HR'
+
+The `Python C API <https://docs.python.org/3/c-api/unicode.html#built-in-codecs>`_ provides several built-in codecs.
+
+
+One could also use a third party encoding library such as libiconv to transcode to UTF-8.
+
+Return C++ strings without conversion
+-------------------------------------
+
+If the data in a C++ ``std::string`` does not represent text and should be returned to Python as ``bytes``, then one can return the data as a ``py::bytes`` object.
+
+.. code-block:: c++
+
+ m.def("return_bytes",
+ []() {
+ std::string s("\xba\xd0\xba\xd0"); // Not valid UTF-8
+ return py::bytes(s); // Return the data without transcoding
+ }
+ );
+
+.. code-block:: python
+
+ >>> example.return_bytes()
+ b'\xba\xd0\xba\xd0'
+
+
+Note the asymmetry: pybind11 will convert ``bytes`` to ``std::string`` without encoding, but cannot convert ``std::string`` back to ``bytes`` implicitly.
+
+.. code-block:: c++
+
+ m.def("asymmetry",
+ [](std::string s) { // Accepts str or bytes from Python
+ return s; // Looks harmless, but implicitly converts to str
+ }
+ );
+
+.. code-block:: python
+
+ >>> isinstance(example.asymmetry(b"have some bytes"), str)
+ True
+
+ >>> example.asymmetry(b"\xba\xd0\xba\xd0") # invalid utf-8 as bytes
+ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte
+
+
+Wide character strings
+======================
+
+When a Python ``str`` is passed to a C++ function expecting ``std::wstring``, ``wchar_t*``, ``std::u16string`` or ``std::u32string``, the ``str`` will be encoded to UTF-16 or UTF-32 depending on how the C++ compiler implements each type, in the platform's endian. When strings of these types are returned, they are assumed to contain valid UTF-16 or UTF-32, and will be decoded to Python ``str``.
+
+.. code-block:: c++
+
+ #define UNICODE
+ #include <windows.h>
+
+ m.def("set_window_text",
+ [](HWND hwnd, std::wstring s) {
+ // Call SetWindowText with null-terminated UTF-16 string
+ ::SetWindowText(hwnd, s.c_str());
+ }
+ );
+ m.def("get_window_text",
+ [](HWND hwnd) {
+ const int buffer_size = ::GetWindowTextLength(hwnd) + 1;
+ auto buffer = std::make_unique< wchar_t[] >(buffer_size);
+
+ ::GetWindowText(hwnd, buffer.data(), buffer_size);
+
+ std::wstring text(buffer.get());
+
+ // wstring will be converted to Python str
+ return text;
+ }
+ );
+
+.. warning::
+
+ Wide character strings may not work as described on Python 2.7 or Python 3.3 compiled with ``--enable-unicode=ucs2``.
+
+Strings in multibyte encodings such as Shift-JIS must transcoded to a UTF-8/16/32 before being returned to Python.
+
+
+Character literals
+==================
+
+C++ functions that accept character literals as input will receive the first character of a Python ``str`` as their input. If the string is longer than one Unicode character, trailing characters will be ignored.
+
+When a character literal is returned from C++ (such as a ``char`` or a ``wchar_t``), it will be converted to a ``str`` that represents the single character.
+
+.. code-block:: c++
+
+ m.def("pass_char", [](char c) { return c; });
+ m.def("pass_wchar", [](wchar_t w) { return w; });
+
+.. code-block:: python
+
+ >>> example.pass_char('A')
+ 'A'
+
+While C++ will cast integers to character types (``char c = 0x65;``), pybind11 does not convert Python integers to characters implicitly. The Python function ``chr()`` can be used to convert integers to characters.
+
+.. code-block:: python
+
+ >>> example.pass_char(0x65)
+ TypeError
+
+ >>> example.pass_char(chr(0x65))
+ 'A'
+
+If the desire is to work with an 8-bit integer, use ``int8_t`` or ``uint8_t`` as the argument type.
+
+Grapheme clusters
+-----------------
+
+A single grapheme may be represented by two or more Unicode characters. For example 'é' is usually represented as U+00E9 but can also be expressed as the combining character sequence U+0065 U+0301 (that is, the letter 'e' followed by a combining acute accent). The combining character will be lost if the two-character sequence is passed as an argument, even though it renders as a single grapheme.
+
+.. code-block:: python
+
+ >>> example.pass_wchar('é')
+ 'é'
+
+ >>> combining_e_acute = 'e' + '\u0301'
+
+ >>> combining_e_acute
+ 'é'
+
+ >>> combining_e_acute == 'é'
+ False
+
+ >>> example.pass_wchar(combining_e_acute)
+ 'e'
+
+Normalizing combining characters before passing the character literal to C++ may resolve *some* of these issues:
+
+.. code-block:: python
+
+ >>> example.pass_wchar(unicodedata.normalize('NFC', combining_e_acute))
+ 'é'
+
+In some languages (Thai for example), there are `graphemes that cannot be expressed as a single Unicode code point <http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>`_, so there is no way to capture them in a C++ character type.
+
+
+References
+==========
+
+* `The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) <https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/>`_
+* `C++ - Using STL Strings at Win32 API Boundaries <https://msdn.microsoft.com/en-ca/magazine/mt238407.aspx>`_ \ No newline at end of file