oracle
diff --git a/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎doc/IMPLEMENTATION_DETAILS.md‎
Lines changed: 117 additions & 0 deletions b/‎doc/IMPLEMENTATION_DETAILS.md‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎graalpython/com.oracle.graal.python.cext/src/unicodeobject.c‎
Lines changed: 169 additions & 3 deletions b/‎graalpython/com.oracle.graal.python.cext/src/unicodeobject.c‎
Lines changed: 169 additions & 3 deletions
diff --git a/‎…l/python/nodes/string/StringLenNode.java‎ ‎…raal/python/test/basic/ComplexTexts.java‎graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/string/StringLenNode.java renamed to graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/basic/ComplexTexts.java
Lines changed: 8 additions & 39 deletions b/‎…l/python/nodes/string/StringLenNode.java‎ ‎…raal/python/test/basic/ComplexTexts.java‎graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/string/StringLenNode.java renamed to graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/basic/ComplexTexts.java
Lines changed: 8 additions & 39 deletions
@@ -7,6 +7,9 @@ language runtime. The main focus is on user-observable behavior of the engine.
 
 * Jython Compatiblity: Implement `from JavaType import *` to import all static members of a Java class
 * Jython Compatiblity: Implement importing Python code from inside JAR files by adding `path/to/jarfile.jar!path/inside/jar` to `sys.path`
+* Added support for date and time interop.
+* Added support for setting the time zone via `Context.Builder.timeZone`.
+* PEP 570 - Python Positional-Only Parameters implemented
 
 ## Version 19.3.0
 
 
@@ -0,0 +1,117 @@
+## Python's global thread state
+
+In CPython, each stack frame is allocated on the heap, and there's a global
+thread state holding on to the chain of currently handled exceptions (e.g. if
+you're nested inside `except:` blocks) as well as the currently flying exception
+(e.g. we're just unwinding the stack).
+
+In PyPy, this is done via their virtualizable frames and a global reference to
+the current top frame. Each frame also has a "virtual reference" to its parent
+frame, so code can just "force" these references to make the stack reachable if
+necessary.
+
+Unfortunately, the elegant solution of "virtual references" doesn't work for us,
+mostly because we're not a tracing JIT: we want the reference to be "virtual"
+even when there are multiple compilation units. With PyPy's solution, this also
+isn't the case, but it only hurts them for nested loops when large stacks must
+be forced to the heap.
+
+In Graal Python, the implementation is thus a bit more involved. Here's how it
+works.
+
+#### The PFrame.Reference
+
+A `PFrame.Reference` is created when entering a Python function. By default it
+only holds on to another reference, that of the Python caller. If there are
+non-Python frames between the newly entered frame and the last Python frame,
+those are ignored - our linked list only connects Python frames. The entry point
+into the interpreter has a `PFrame.Reference` with no caller.
+
+###### ExecutionContext.CallContext and ExecutionContext.CalleeContext
+
+If we're only calling between Python, we pass our `PFrame.Reference` as implicit
+argument to any callees. On entry, they will create their own `PFrame.Reference`
+as the next link in this backwards-connected linked-list. As an optimization, we
+use assumptions both on the calling node as well as on the callee root node to
+avoid passing the reference (in the caller) and linking it (on the callee
+side). This assumption is invalidated the first time the reference is actually
+needed. But even then, often the `PFrame.Reference` doesn't hold on to anything
+else, because it was only used for traversal, so this is pretty cheap even in
+the not inlined case.
+
+When an event forces the frame to materialize on the heap, the reference is
+filled. This is usually only the case when someone uses `sys._getframe` or
+accesses the traceback of an exception. If the stack is still live, we walk the
+stack and insert the "calling node" and create a "PyFrame" object that mirrors
+the locals in the Truffle frame. But we need to be able to do this also for
+frames that are no longer live, e.g. when an exception was a few frames up. To
+ensure this, we set a boolean flag on `PFrame.Reference` to mark it as "escaped"
+when it is attached to an exception (or anything else), but not accessed,
+yet. Whenever a Python call returns and its `PFrame.Reference` was marked such,
+the "PyFrame" is also filled in by copying from the VirtualFrame. This way, the
+stack is lazily forced to the heap as we return from functions. If we're lucky
+and it is never actually accessed *and* the calls are all inlined, those fill-in
+operations can be escape-analyzed away.
+
+To implement all this, we use the ExecutionContext.CallContext and
+ExecutionContext.CalleeContext classes. These also use profiling information to
+eagerly fill in frame information if the callees actually access the stack, for
+example, so that no further stack walks need to take place.
+
+###### ExecutionContext.IndirectCallContext and ExecutionContext.IndirectCalleeContext
+
+If we're mixing Python frames with non-Python frames, or if we are making calls
+to methods and cannot pass the Truffle frame, we need to store the last
+`PFrame.Reference` on the context so that, if we ever return back into a Python
+function, it can properly link to the last frame. However, this is potentially
+expensive, because it means storing a linked list of frames on the context. So
+instead, we do it only lazily. When an "indirect" Python callee needs its
+caller, it initially walks the stack to find it. But it will also tell the last
+Python node that made a call to a "foreign" callee that it will have to store
+its `PFrame.Reference` globally in the future for it to be available later.
+
+#### The current PException
+
+Now that we have a mechanism to lazily make available only as much frame state
+as needed, we use the same mechanism to also pass the currently handled
+exception. Unlike CPython we do not use a stack of currently handled exceptions,
+instead we utilize the call stack of Java by always passing the current exception
+and holding on to the last (if any) in a local variable.
+
+## Abstract operations on Python objects
+
+Many generic operations on Python objects in CPython are defined in the header
+files `abstract.c` and `abstract.h`. These operations are widely used and their
+interplay and intricacies are the cause for the conversion, error message, and
+control flow bugs when not mimicked correctly. Our current approach is to
+provide many of these abstract operations as part of the
+`PythonObjectLibrary`. Usually, this means there are at least two messages for
+each operation - one that takes a `ThreadState` argument, and one that
+doesn't. The intent is to allow passing of exception state and caller
+information similar to how we do it with the `PFrame` argument even across
+library messages, which cannot take a VirtualFrame.
+
+All nodes that are used in message implementations must allow uncached
+usage. Often (e.g. in the case of the generic `CallNode`) they offer execute
+methods with and without frames. If a `ThreadState` was passed to the message, a
+frame to pass to the node can be reconstructed using
+`PArguments.frameForCall(threadState)`. Here's an example:
+
+```java
+@ExportMessage
+long messageWithState(ThreadState state,
+        @Cached CallNode callNode) {
+    Object callable = ...
+
+    if (state != null) {
+        return callNode.execute(PArguments.frameForCall(state), callable, arguments);
+    } else {
+        return callNode.execute(callable, arguments);
+    }
+}
+```
+
+*Note*: It is **always** preferable to call an `execute` method with a
+`VirtualFrame` when both one with and without exist! The reason is that this
+avoids materialization of the frame state in more cases, as described on the
+section on Python's global thread state above.
@@ -109,6 +109,109 @@ static Py_ssize_t unicode_aswidechar(PyObject *unicode, wchar_t *w, Py_ssize_t s
     }
 }
 
+#define _PyUnicode_UTF8(op)                             \
+    (((PyCompactUnicodeObject*)(op))->utf8)
+#define _PyUnicode_UTF8_LENGTH(op)                      \
+    (((PyCompactUnicodeObject*)(op))->utf8_length)
+#define _PyUnicode_WSTR(op)                             \
+    (((PyASCIIObject*)(op))->wstr)
+#define _PyUnicode_WSTR_LENGTH(op)                      \
+    (((PyCompactUnicodeObject*)(op))->wstr_length)
+#define _PyUnicode_LENGTH(op)                           \
+    (((PyASCIIObject *)(op))->length)
+#define _PyUnicode_STATE(op)                            \
+    (((PyASCIIObject *)(op))->state)
+#define _PyUnicode_DATA_ANY(op)                         \
+    (((PyUnicodeObject*)(op))->data.any)
+
+POLYGLOT_DECLARE_TYPE(PyUnicodeObject);
+
+PyUnicodeObject* unicode_subtype_new(PyTypeObject *type, PyObject *unicode) {
+    PyObject *self;
+    Py_ssize_t length, char_size;
+    int share_wstr, share_utf8;
+    unsigned int kind;
+    void *data;
+
+    if (unicode == NULL)
+        return NULL;
+    assert(_PyUnicode_CHECK(unicode));
+    if (PyUnicode_READY(unicode) == -1) {
+        Py_DECREF(unicode);
+        return NULL;
+    }
+
+    self = type->tp_alloc(type, 0);
+    if (self == NULL) {
+        Py_DECREF(unicode);
+        return NULL;
+    }
+    kind = PyUnicode_KIND(unicode);
+    length = PyUnicode_GET_LENGTH(unicode);
+
+    _PyUnicode_LENGTH(self) = length;
+    _PyUnicode_STATE(self).interned = 0;
+    _PyUnicode_STATE(self).kind = kind;
+    _PyUnicode_STATE(self).compact = 0;
+    _PyUnicode_STATE(self).ascii = _PyUnicode_STATE(unicode).ascii;
+    _PyUnicode_STATE(self).ready = 1;
+    _PyUnicode_WSTR(self) = NULL;
+    _PyUnicode_UTF8_LENGTH(self) = 0;
+    _PyUnicode_UTF8(self) = NULL;
+    _PyUnicode_WSTR_LENGTH(self) = 0;
+    _PyUnicode_DATA_ANY(self) = NULL;
+
+    share_utf8 = 0;
+    share_wstr = 0;
+    if (kind == PyUnicode_1BYTE_KIND) {
+        char_size = 1;
+        if (PyUnicode_MAX_CHAR_VALUE(unicode) < 128)
+            share_utf8 = 1;
+    }
+    else if (kind == PyUnicode_2BYTE_KIND) {
+        char_size = 2;
+        if (sizeof(wchar_t) == 2)
+            share_wstr = 1;
+    }
+    else {
+        assert(kind == PyUnicode_4BYTE_KIND);
+        char_size = 4;
+        if (sizeof(wchar_t) == 4)
+            share_wstr = 1;
+    }
+
+    /* Ensure we won't overflow the length. */
+    if (length > (PY_SSIZE_T_MAX / char_size - 1)) {
+        PyErr_NoMemory();
+//        Py_DECREF(unicode);
+//        Py_DECREF(self);
+        return NULL;
+    }
+    data = malloc((length + 1) * char_size);
+    if (data == NULL) {
+        PyErr_NoMemory();
+//        Py_DECREF(unicode);
+//        Py_DECREF(self);
+        return NULL;
+    }
+
+    _PyUnicode_DATA_ANY(self) = data;
+    if (share_utf8) {
+        _PyUnicode_UTF8_LENGTH(self) = length;
+        _PyUnicode_UTF8(self) = data;
+    }
+    if (share_wstr) {
+        _PyUnicode_WSTR_LENGTH(self) = length;
+        _PyUnicode_WSTR(self) = (wchar_t *)data;
+    }
+
+    memcpy(data, PyUnicode_DATA(unicode),
+              kind * (length + 1));
+    assert(_PyUnicode_CheckConsistency(self, 1));
+    Py_DECREF(unicode);
+    return (PyUnicodeObject*) polyglot_from_PyUnicodeObject((PyUnicodeObject*)self);
+}
+
 PyObject* PyUnicode_FromString(const char* o) {
     return to_sulong(polyglot_from_string(o, SRC_CS));
 }
@@ -245,9 +348,8 @@ PyObject* PyUnicode_FromObject(PyObject* o) {
     return UPCALL_CEXT_O(_jls_PyUnicode_FromObject, native_to_java(o));
 }
 
-UPCALL_ID(PyUnicode_GetLength);
 Py_ssize_t PyUnicode_GetLength(PyObject *unicode) {
-    return UPCALL_CEXT_L(_jls_PyUnicode_GetLength, native_to_java(unicode));
+    return PyUnicode_GET_LENGTH(unicode);
 }
 
 UPCALL_ID(PyUnicode_Concat);
@@ -305,7 +407,7 @@ PyObject * PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *err
     PyObject *result;
     void *jerrors = errors != NULL ? polyglot_from_string(errors, SRC_CS) : NULL;
     int bo = byteorder != NULL ? *byteorder : 0;
-    return polyglot_invoke(PY_TRUFFLE_CEXT, "PyTruffle_Unicode_DecodeUTF32", s, size, native_to_java(jerrors), bo, NULL);
+    return polyglot_invoke(PY_TRUFFLE_CEXT, "PyTruffle_Unicode_DecodeUTF32", polyglot_from_i8_array(s, size), size, native_to_java(jerrors), bo, NULL);
 }
 
 Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size) {
@@ -525,3 +627,67 @@ UPCALL_ID(PyUnicode_Replace);
 PyObject * PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount) {
 	return UPCALL_CEXT_O(_jls_PyUnicode_Replace, native_to_java(str), native_to_java(substr), native_to_java(replstr), maxcount);
 }
+
+/* Generic helper macro to convert characters of different types.
+   from_type and to_type have to be valid type names, begin and end
+   are pointers to the source characters which should be of type
+   "from_type *".  to is a pointer of type "to_type *" and points to the
+   buffer where the result characters are written to. */
+#define _PyUnicode_CONVERT_BYTES(from_type, to_type, begin, end, to) \
+    do {                                                \
+        to_type *_to = (to_type *)(to);                \
+        const from_type *_iter = (from_type *)(begin);  \
+        const from_type *_end = (from_type *)(end);     \
+        Py_ssize_t n = (_end) - (_iter);                \
+        const from_type *_unrolled_end =                \
+            _iter + _Py_SIZE_ROUND_DOWN(n, 4);          \
+        while (_iter < (_unrolled_end)) {               \
+            _to[0] = (to_type) _iter[0];                \
+            _to[1] = (to_type) _iter[1];                \
+            _to[2] = (to_type) _iter[2];                \
+            _to[3] = (to_type) _iter[3];                \
+            _iter += 4; _to += 4;                       \
+        }                                               \
+        while (_iter < (_end))                          \
+            *_to++ = (to_type) *_iter++;                \
+    } while (0)
+
+
+POLYGLOT_DECLARE_TYPE(Py_UCS4);
+
+/* used from Java only to decode a native unicode object */
+void* native_unicode_as_string(PyObject *string) {
+	Py_UCS4 *target = NULL;
+    int kind = 0;
+    void *data = NULL;
+    void *result = NULL;
+    Py_ssize_t len;
+    if (PyUnicode_READY(string) == -1) {
+    	PyErr_Format(PyExc_TypeError, "provided unicode object is not ready");
+        return NULL;
+    }
+    kind = PyUnicode_KIND(string);
+    data = PyUnicode_DATA(string);
+    len = PyUnicode_GET_LENGTH(string);
+    if (kind == PyUnicode_1BYTE_KIND) {
+        Py_UCS1 *start = (Py_UCS1 *) data;
+    	if (PyUnicode_IS_COMPACT_ASCII(string)) {
+            return polyglot_from_string_n((const char *)data, sizeof(Py_UCS1) * len, "ascii");
+    	}
+        return polyglot_from_string_n((const char *)data, sizeof(Py_UCS1) * len, "latin1");
+    }
+    else if (kind == PyUnicode_2BYTE_KIND) {
+        Py_UCS2 *start = (Py_UCS2 *) data;
+        target = PyMem_New(Py_UCS4, len);
+        if (!target) {
+            PyErr_NoMemory();
+            return NULL;
+        }
+        _PyUnicode_CONVERT_BYTES(Py_UCS2, Py_UCS4, start, start + len, target);
+        result = polyglot_from_string_n((const char *)target, sizeof(Py_UCS4) * len, "UTF-32");
+        free(target);
+        return result;
+    }
+    assert(kind == PyUnicode_4BYTE_KIND);
+    return polyglot_from_string_n((const char *)data, sizeof(Py_UCS4) * len, "UTF-32");
+}
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * The Universal Permissive License (UPL), Version 1.0
@@ -38,46 +38,15 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
-package com.oracle.graal.python.nodes.string;
 
-import com.oracle.graal.python.builtins.objects.str.LazyString;
-import com.oracle.graal.python.builtins.objects.str.PString;
-import com.oracle.truffle.api.dsl.Cached;
-import com.oracle.truffle.api.dsl.GenerateUncached;
-import com.oracle.truffle.api.dsl.Specialization;
-import com.oracle.truffle.api.nodes.Node;
-import com.oracle.truffle.api.profiles.BranchProfile;
-import com.oracle.truffle.api.profiles.ValueProfile;
+package com.oracle.graal.python.test.basic;
 
-@GenerateUncached
-public abstract class StringLenNode extends Node {
-    public abstract int execute(Object self);
+import static com.oracle.graal.python.test.PythonTests.assertPrints;
+import org.junit.Test;
 
-    @Specialization
-    public int len(String self) {
-        return self.length();
-    }
-
-    @Specialization
-    public int len(PString self,
-                    @Cached("createClassProfile()") ValueProfile classProfile,
-                    @Cached("create()") BranchProfile uncommonStringTypeProfile) {
-        Object profiled = classProfile.profile(self.getCharSequence());
-        if (profiled instanceof String) {
-            return ((String) profiled).length();
-        } else if (profiled instanceof LazyString) {
-            return ((LazyString) profiled).length();
-        } else {
-            uncommonStringTypeProfile.enter();
-            return ((CharSequence) profiled).length();
-        }
-    }
-
-    public static StringLenNode create() {
-        return StringLenNodeGen.create();
-    }
-
-    public static StringLenNode getUncached() {
-        return StringLenNodeGen.getUncached();
+public class ComplexTexts {
+    @Test
+    public void negativeZero() {
+        assertPrints("(1-0j)\n", "print(complex(1,-0.0))");
     }
 }