Content Table
- How to add an override
- How to get the test Id
- How to terminate an A/B test
- How to manage user eligibility
- How to manage group meta
- How to do a feature roll out
- How to run distributed group assignments
- How to run Bayesian Analysis
All methods in API have corresponding endpoints in http service library (play or http4s).
How do I fix a user to a certain group so that I can test the treatment
You can use the override feature to add overrides which are pairs of user Id and group name. API for adding overides
How to get the test ID
A/B tests are organized around “feature”s, you can run multiple A/B tests for the same feature, they just can’t be scheduled to have any overlap with each other. For each test, there is a test Id which is generated when you created it. If you need the test Id for a specific test, you can use this method which, and will respond with a list of all tests against this feature. In the list you can find more details of the tests including the test Id.
How to terminate/delete an A/B test
You would need the testId to use this method. If a test already started, the test will be expired immediately. If the test is schedule to run in the future, it will be removed from the database.
How to manage user eligibility
When you query abtest assignment, you can pass in the field meta
a JSON object as meta
information for that user.
This JSON object is expected to be flat with string values only.
In A/B tests you can set a user meta criteria in field userMetaCriteria
, this user meta criteria is Json object is
similiar to MongoDB’s json query. Other than exact match it supports $regex
, $in
, $versionStart
, $versionRange
,
as well as combinators $and
and $or
The following example lists most of the criteria.
{
"sex" : "female", //matches users whose "sex" field is exactly "female"
"description" : {
"%regex" : "[shinny|fancy]" //matches users whose "description" field matches a regex "[shinny|fancy]"
},
"device" : {
"%in": ["iphone","ipad"] //matches users whose "device" is one of the two strings "iphone" and "ipad"
},
"%or": { //matches users whose "city" is "LA" or "state" is "NY"
"city": "LA",
"state": "NY"
},
"age" : {
"%gt" : 32 //matches age older than 32, other compartor includes %ge, %lt and %le
},
"clientVer": {
"%versionStart" : "1.0.0" //special filter for version strings. Matches users whose "clientVer" is later than "1.0.0"
},
"androidVer": {
"%versionRange" : ["2.0", "3.1"] //special filter for version strings. Matches users whose "androidVer" is between than "2.0" and "3.1"
}
}
The combinator %or
here is written as an object, which is convenient but also means field names cannot be duplicated.
In case where you need to have multiple criteria on the same field within an %or
, you also use an array of objects.
For example:
{
"%or": [
{ "city": "LA" },
{ "city": "NY" }
]
}
You can manage this user criteria using the CLI which can be found and downloaded here.
Then to show the current User meta criteria for a feature run
./thomas-cli_XXX.jar userMetaCriteria show -f MY_FEATUR_ENAME --host YOUR_HOST --rootPath YOUR_ROOT_PATH
To update it you can write your criteria json in a file and use the following command to update
./thomas-cli_XXX.jar userMetaCriteria update --criteriaFile crit.json --new -f MY_FEATUR_ENAME --host YOUR_HOST --rootPath YOUR_ROOT_PATH
How to manage group meta
Optionally when you get a group assignment, the service can return the associated group metadata you set the group. The best way to manage group meta is to use the thomas CLI which can be found and downloaded here.
Download that thomas-cli_XXX.jar
and give it the execution permission.
chmod +x thomas-cli_XXX.jar
Then to show the current group meta for a feature run
./thomas-cli_XXX.jar groupMeta show -f MY_FEATUR_ENAME --host YOUR_HOST --rootPath YOUR_ROOT_PATH
To edit/add a group meta
./thomas-cli_XXX.jar groupMeta add --meta '{ "A" : {"newFeature": 2 }, "B" : { "newFeature": 1 }}' -f MY_FEATUR_ENAME --host YOUR_HOST --rootPath YOUR_ROOT_PATH
if the test already started, you will get an error message
The latest test is already started, if you want to automatically create a new revision, please run the command again with “–new” flag
As the message suggest, if you want to create a new revision of the test that starts immediately, run the same command again but add a --new
flag.
./thomas-cli_XXX.jar groupMeta add --new --meta '{ "A" : {"newFeature": 2 }, "B" : { "newFeature": 1 }}' -f MY_FEATUR_ENAME --host YOUR_HOST --rootPath YOUR_ROOT_PATH
How to do a feature roll out
You can use A/B test service to gradually roll out feature by incrementing the experiment group size in a series of tests. You start the first test without an end date, this will make the test run indefinitely. To gradually increase of experiment group size, use the continue method to create subsequent tests (all without end date), with larger and larger experiment group sizes, until you reach 100%.
How to run distributed group assignments
Thomas provides a thomas-client that can pull down experiments metadata and calculate user assignments in parallel on local CPUs. It provides a Java API class so that it can be easily used in Java application or pyspark. To run it in pyspark,
pyspark --packages com.iheart:thomas-client_2.11:LATEST_VERSION
Check here for the latest version to use in place of LATEST_VERSION
Then in pyspark you can run
client = sc._jvm.com.iheart.thomas.client.JavaAssignment.create("http://myhost/testsWithFeatures", )
client.assignments("813579", ["Feature_forYouBanner"], {"deviceId": "ax3263sdx11"})
The first line creates the client
, using com.iheart.thomas.client.JavaAssignment.create
, you need to pass in a url string pointing to the endpoint corresponding to this API method (on play, if you follow the xample, it will be “yourHost/testsWithFeatures”, on http4s it will be “yourHost/tests/cache”). You can also pass in a second optional argument - a timestamp for the as-of time for your assignments. For example, if you want to return assignments as of tomorrow 12:00PM, you need to get the epoch second time stamp of that time and pass in. You should reuse this client
in your session, during the creation it makes an http call to the thomas A/B test http service and
download all the relevant tests and overrides. So please avoid recreating it unnecessarily.
client.assignments(userId, [tags], {user_meta})
returns a Map (or hashmap if you are in python) of assignments. The keys of this Map will be feature names, and the values are the group names, the second and third arguments [tags]
and {user_meta}
are optional, ignore them if your tests don’t requirement them.
This solution works fine for pyspark with small amount of data. For large dataset, Pyspark introp with JVM is not efficient.
Thomas also provides a tighter spark integration module thomas-spark
, which provides an UDF and a function that works directly with
DataFrame. The assignment computation is distributed through UDF
Here is an example on how to use this in pyspark: Start spark with the package
pyspark --packages com.iheart:thomas-spark_2.11:LATEST_VERSION
Inside pyspark shell, first create the instance of an A/B test Assigner
ta = sc._jvm.com.iheart.thomas.spark.Assigner.create("https://MY_ABTEST_SERVICE_HOST/abtest/testsWithFeatures")
Then you can use it add a column to an existing DataFrame
from pyspark.mllib.common import _py2java
from pyspark.mllib.common import _java2py
mockUserIds = spark.createDataFrame([("232",), ("3609",), ("3423",)], ["uid"])
result = _java2py(sc, ta.assignments(_py2java(sc, mockUserIds), "My_Test_Feature", "uid"))
Note that some python to java conversion is needed since thomas-spark
is written in Scala.
The Assigner
also provides a Spark UDF assignUdf
. You can call it with a feature name to
get an UDF that returns the assignment for that abtest feature.
spark._jsparkSession.udf().register("assign", ta.assignUdf("My_TEST_FEATURES"))
sqlContext.registerDataFrameAsTable(mockUserIds, "userIds")
result = sql("select uid, assign(uid) as assignment from userIds")
Or instead of registering the udf, you can use it through a python function
from pyspark.sql.column import Column
from pyspark.sql.column import _to_java_column
from pyspark.sql.column import _to_seq
from pyspark.sql.functions import col
def assign(col):
_javaAssign = ta.assignUdf("My_TEST_FEATURES")
return Column(_javaAssign.apply(_to_seq(sc, [col], _to_java_column)))
mockUserIds.withColumn('assignment', assign(col('uid'))).show()
In scala, it’s more straightforward.
import spark.implicits._
import org.apache.spark.sql.functions.col
val mockUserIds = (1 to 10000).map(_.toString).toDF("userId")
val assigner = com.iheart.thomas.spark.Assigner.create("https://MY_ABTEST_SERVICE_HOST/abtest/testsWithFeatures")
mockUserIds.withColumn("assignment", assigner.assignUdf("My_TEST_FEATURES")(col("userId")))
assigner.assignUdf
assigns based on the test data it retrieves when it’s created from Assigner.create
.
If you have a long running job, e.g. in Spark stream, you might want a udf
that keeps test data updated,
so that over a longer period of time it keeps assigning based on latest test data from server.
In that case, you can use the com.iheart.thomas.AutoRefreshAssigner
import concurrent.duration._
val assigner = com.iheart.thomas.spark.AutoRefreshAssigner(
url = "https://MY_ABTEST_SERVICE_HOST/abtest/testsWithFeatures",
refreshPeriod = 10.minutes
)
mockUserIds.withColumn("assignment", assigner.assignUdf("My_TEST_FEATURES")(col("userId")))
The refreshPeriod
dictates how often the test data is retrieved from the A/B test service per spark partition.
How to run Bayesian Analysis
Since Thomas does not come with an analytics solution, to analyze the A/B test results using Thomas’s Bayesian utility, you need to write integration with your analytics solution. Please refer to the dedicated page for detailed guide on this one.